mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
[DOC] Remove redundant Zeppelin build information from yarn_install.md
[DOC] Guide users to set SPARK_HOME to use spark in yarn mode [DOC] Change spark version to the latest in yarn config example [DOC] Add note that spark for cdh4 doesn't support yarn [DOC] Remove spark properties `spark.home` and `spark.yarn.jar` from doc which doesn't work on zeppelin anymore [DOC] Fix typos [DOC] Add info that embedded spark doesn't work on yarn mode anymore when Spark version is 1.5.0 or higher in README.md
This commit is contained in:
parent
6465ba8902
commit
c544decd16
3 changed files with 34 additions and 114 deletions
|
|
@ -104,7 +104,7 @@ minor version can be adjusted by `-Dhadoop.version=x.x.x`
|
|||
##### -Pyarn (optional)
|
||||
|
||||
enable YARN support for local mode
|
||||
|
||||
> YARN for local mode is not supported for Spark v1.5.0 or higher. Set SPARK_HOME instead.
|
||||
|
||||
##### -Ppyspark (optional)
|
||||
|
||||
|
|
|
|||
|
|
@ -22,9 +22,9 @@ limitations under the License.
|
|||
|
||||
|
||||
## Zeppelin Installation
|
||||
Welcome to your first trial to explore Zeppelin !
|
||||
Welcome to your first trial to explore Zeppelin!
|
||||
|
||||
In this documentation, we will explain how you can install Zeppelin from **Binary Package** or build from **Source** by yourself. Plus, you can see all of Zeppelin's configurations in the **Zeppelin Configuration** section below.
|
||||
In this documentation, we will explain how you can install Zeppelin from **Binary Package** or build from **Source** by yourself. Plus, you can see all of Zeppelin's configurations in the [Zeppelin Configuration](install.html#zeppelin-configuration) section below.
|
||||
|
||||
### Install with Binary Package
|
||||
|
||||
|
|
@ -32,9 +32,17 @@ If you want to install Zeppelin with latest binary package, please visit [this p
|
|||
|
||||
### Build from Zeppelin Source
|
||||
|
||||
You can also build Zeppelin from the source. Please check instructions in `README.md` in [Zeppelin github](https://github.com/apache/incubator-zeppelin/blob/master/README.md).
|
||||
You can also build Zeppelin from the source.
|
||||
|
||||
#### Prerequisites for build
|
||||
* Java 1.7
|
||||
* Git
|
||||
* Maven(3.1.x or higher)
|
||||
* Node.js Package Manager
|
||||
|
||||
If you don't have requirements prepared, please check instructions in [README.md](https://github.com/apache/incubator-zeppelin/blob/master/README.md) for the details.
|
||||
|
||||
<a name="zeppelin-configuration"> </a>
|
||||
## Zeppelin Configuration
|
||||
|
||||
You can configure Zeppelin with both **environment variables** in `conf/zeppelin-env.sh` and **java properties** in `conf/zeppelin-site.xml`. If both are defined, then the **environment variables** will be used priorly.
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ limitations under the License.
|
|||
{% include JB/setup %}
|
||||
|
||||
## Introduction
|
||||
This page describes how to pre-configure a bare metal node, build & configure Zeppelin on it, configure Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of Hadoop. It also describes steps to configure Spark & Hive interpreter of Zeppelin.
|
||||
This page describes how to pre-configure a bare metal node, configure Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of Hadoop. It also describes steps to configure Spark & Hive interpreter of Zeppelin.
|
||||
|
||||
## Prepare Node
|
||||
|
||||
|
|
@ -44,84 +44,16 @@ Its assumed in the rest of the document that zeppelin user is indeed created and
|
|||
|
||||
### List of Prerequisites
|
||||
|
||||
* CentOS 6.x
|
||||
* Git
|
||||
* Java 1.7
|
||||
* Apache Maven
|
||||
* Hadoop client.
|
||||
* Spark.
|
||||
* CentOS 6.x, Mac OSX, Ubuntu 14.X
|
||||
* Java 1.7
|
||||
* Hadoop client
|
||||
* Spark
|
||||
* Internet connection is required.
|
||||
|
||||
Its assumed that the node has CentOS 6.x installed on it. Although any version of Linux distribution should work fine. The working directory of all prerequisite pacakges is /home/zeppelin/prerequisites, although any location could be used.
|
||||
|
||||
#### Git
|
||||
Intall latest stable version of Git. This document describes installation of version 2.4.8
|
||||
|
||||
```bash
|
||||
yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
|
||||
yum install gcc perl-ExtUtils-MakeMaker
|
||||
yum remove git
|
||||
cd /home/zeppelin/prerequisites
|
||||
wget https://github.com/git/git/archive/v2.4.8.tar.gz
|
||||
tar xzf git-2.0.4.tar.gz
|
||||
cd git-2.0.4
|
||||
make prefix=/home/zeppelin/prerequisites/git all
|
||||
make prefix=/home/zeppelin/prerequisites/git install
|
||||
echo "export PATH=$PATH:/home/zeppelin/prerequisites/bin" >> /home/zeppelin/.bashrc
|
||||
source /home/zeppelin/.bashrc
|
||||
git --version
|
||||
```
|
||||
|
||||
Assuming all the packages are successfully installed, running the version option with git command should display
|
||||
|
||||
```bash
|
||||
git version 2.4.8
|
||||
```
|
||||
|
||||
#### Java
|
||||
Zeppelin works well with 1.7.x version of Java runtime. Download JDK version 7 and a stable update and follow below instructions to install it.
|
||||
|
||||
```bash
|
||||
cd /home/zeppelin/prerequisites/
|
||||
#Download JDK 1.7, Assume JDK 7 update 79 is downloaded.
|
||||
tar -xf jdk-7u79-linux-x64.tar.gz
|
||||
echo "export JAVA_HOME=/home/zeppelin/prerequisites/jdk1.7.0_79" >> /home/zeppelin/.bashrc
|
||||
source /home/zeppelin/.bashrc
|
||||
echo $JAVA_HOME
|
||||
```
|
||||
Assuming all the packages are successfully installed, echoing JAVA_HOME environment variable should display
|
||||
|
||||
```bash
|
||||
/home/zeppelin/prerequisites/jdk1.7.0_79
|
||||
```
|
||||
|
||||
#### Apache Maven
|
||||
Download and install a stable version of Maven.
|
||||
|
||||
```bash
|
||||
cd /home/zeppelin/prerequisites/
|
||||
wget ftp://mirror.reverse.net/pub/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
|
||||
tar -xf apache-maven-3.3.3-bin.tar.gz
|
||||
cd apache-maven-3.3.3
|
||||
export MAVEN_HOME=/home/zeppelin/prerequisites/apache-maven-3.3.3
|
||||
echo "export PATH=$PATH:/home/zeppelin/prerequisites/apache-maven-3.3.3/bin" >> /home/zeppelin/.bashrc
|
||||
source /home/zeppelin/.bashrc
|
||||
mvn -version
|
||||
```
|
||||
|
||||
Assuming all the packages are successfully installed, running the version option with mvn command should display
|
||||
|
||||
```bash
|
||||
Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T04:57:37-07:00)
|
||||
Maven home: /home/zeppelin/prerequisites/apache-maven-3.3.3
|
||||
Java version: 1.7.0_79, vendor: Oracle Corporation
|
||||
Java home: /home/zeppelin/prerequisites/jdk1.7.0_79/jre
|
||||
Default locale: en_US, platform encoding: UTF-8
|
||||
OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix"
|
||||
```
|
||||
It's assumed that the node has CentOS 6.x installed on it. Although any version of Linux distribution should work fine. The working directory of all prerequisite packages is /home/zeppelin/prerequisites, although any location could be used.
|
||||
|
||||
#### Hadoop client
|
||||
Zeppelin can work with multiple versions & distributions of Hadoop. A complete list [is available here.](https://github.com/apache/incubator-zeppelin#build) This document assumes Hadoop 2.7.x client libraries including configuration files are installed on Zeppelin node. It also assumes /etc/hadoop/conf contains various Hadoop configuration files. The location of Hadoop configuration files may vary, hence use appropriate location.
|
||||
Zeppelin can work with multiple versions & distributions of Hadoop. A complete list is available [here](https://github.com/apache/incubator-zeppelin#build). This document assumes Hadoop 2.7.x client libraries including configuration files are installed on Zeppelin node. It also assumes /etc/hadoop/conf contains various Hadoop configuration files. The location of Hadoop configuration files may vary, hence use appropriate location.
|
||||
|
||||
```bash
|
||||
hadoop version
|
||||
|
|
@ -134,32 +66,21 @@ This command was run using /usr/hdp/2.3.1.0-2574/hadoop/lib/hadoop-common-2.7.1.
|
|||
```
|
||||
|
||||
#### Spark
|
||||
Zeppelin can work with multiple versions Spark. A complete list [is available here.](https://github.com/apache/incubator-zeppelin#build) This document assumes Spark 1.3.1 is installed on Zeppelin node at /home/zeppelin/prerequisites/spark.
|
||||
Spark is supported out of the box and to take advantage of this, you need to Download appropriate version of Spark binary packages from [Spark Download page](http://spark.apache.org/downloads.html) and unzip it.
|
||||
Zeppelin can work with multiple versions of Spark. A complete list is available [here](https://github.com/apache/incubator-zeppelin#build).
|
||||
This document assumes Spark 1.6.0 is installed at /usr/lib/spark.
|
||||
> Note: Spark should be installed on the same node as Zeppelin.
|
||||
|
||||
## Build
|
||||
> Note: Spark's pre-built package for CDH 4 doesn't support yarn.
|
||||
|
||||
Checkout source code from [git://git.apache.org/incubator-zeppelin.git](git://git.apache.org/incubator-zeppelin.git).
|
||||
#### Zeppelin
|
||||
|
||||
```bash
|
||||
cd /home/zeppelin/
|
||||
git clone git://git.apache.org/incubator-zeppelin.git
|
||||
```
|
||||
Zeppelin package is available at `/home/zeppelin/incubator-zeppelin` after the checkout completes.
|
||||
|
||||
### Cluster mode
|
||||
|
||||
As its assumed Hadoop 2.7.x is installed on the YARN cluster & Spark 1.3.1 is installed on Zeppelin node. Hence appropriate options are chosen to build Zeppelin. This is very important as Zeppelin will bundle corresponding Hadoop & Spark libraries and they must match the ones present on YARN cluster & Zeppelin Spark installation.
|
||||
|
||||
Zeppelin is a maven project and hence must be built with Apache Maven.
|
||||
|
||||
```bash
|
||||
cd /home/zeppelin/incubator-zeppelin
|
||||
mvn clean package -Pspark-1.3 -Dspark.version=1.3.1 -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests
|
||||
```
|
||||
Building Zeppelin for first time downloads various dependencies and hence takes few minutes to complete.
|
||||
Checkout source code from [git://git.apache.org/incubator-zeppelin.git](https://github.com/apache/incubator-zeppelin.git) or download binary package from [Download page](https://zeppelin.incubator.apache.org/download.html).
|
||||
You can refer [Install](install.html) page for the details.
|
||||
This document assumes that Zeppelin is located under `/home/zeppelin/incubator-zeppelin`.
|
||||
|
||||
## Zeppelin Configuration
|
||||
Zeppelin configurations needs to be modified to connect to YARN cluster. Create a copy of zeppelin environment XML
|
||||
Zeppelin configuration needs to be modified to connect to YARN cluster. Create a copy of zeppelin environment shell script.
|
||||
|
||||
```bash
|
||||
cp /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh.template /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh
|
||||
|
|
@ -168,9 +89,10 @@ cp /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh.template /home/zeppeli
|
|||
Set the following properties
|
||||
|
||||
```bash
|
||||
export JAVA_HOME=/home/zeppelin/prerequisites/jdk1.7.0_79
|
||||
export HADOOP_CONF_DIR=/etc/hadoop/conf
|
||||
export JAVA_HOME="/usr/java/jdk1.7.0_79"
|
||||
export HADOOP_CONF_DIR="/etc/hadoop/conf"
|
||||
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.1.0-2574"
|
||||
export SPARK_HOME="/usr/lib/spark"
|
||||
```
|
||||
|
||||
As /etc/hadoop/conf contains various configurations of YARN cluster, Zeppelin can now submit Spark/Hive jobs on YARN cluster form its web interface. The value of hdp.version is set to 2.3.1.0-2574. This can be obtained by running the following command
|
||||
|
|
@ -196,7 +118,7 @@ bin/zeppelin-daemon.sh stop
|
|||
```
|
||||
|
||||
## Interpreter
|
||||
Zeppelin provides to various distributed processing frameworks to process data that ranges from Spark, Hive, Tajo, Ignite and Lens to name a few. This document describes to configure Hive & Spark interpreters.
|
||||
Zeppelin provides various distributed processing frameworks to process data that ranges from Spark, Hive, Tajo, Ignite and Lens to name a few. This document describes to configure Hive & Spark interpreters.
|
||||
|
||||
### Hive
|
||||
Zeppelin supports Hive interpreter and hence copy hive-site.xml that should be present at /etc/hive/conf to the configuration folder of Zeppelin. Once Zeppelin is built it will have conf folder under /home/zeppelin/incubator-zeppelin.
|
||||
|
|
@ -209,7 +131,7 @@ Once Zeppelin server has started successfully, visit http://[zeppelin-server-hos
|
|||
Click on Save button. Once these configurations are updated, Zeppelin will prompt you to restart the interpreter. Accept the prompt and the interpreter will reload the configurations.
|
||||
|
||||
### Spark
|
||||
Zeppelin was built with Spark 1.3.1 and it was assumed that 1.3.1 version of Spark is installed at /home/zeppelin/prerequisites/spark. Look for Spark configrations and click edit button to add the following properties
|
||||
It was assumed that 1.6.0 version of Spark is installed at /usr/lib/spark. Look for Spark configurations and click edit button to add the following properties
|
||||
|
||||
<table class="table-configuration">
|
||||
<tr>
|
||||
|
|
@ -222,11 +144,6 @@ Zeppelin was built with Spark 1.3.1 and it was assumed that 1.3.1 version of Spa
|
|||
<td>yarn-client</td>
|
||||
<td>In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>spark.home</td>
|
||||
<td>/home/zeppelin/prerequisites/spark</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>spark.driver.extraJavaOptions</td>
|
||||
<td>-Dhdp.version=2.3.1.0-2574</td>
|
||||
|
|
@ -237,11 +154,6 @@ Zeppelin was built with Spark 1.3.1 and it was assumed that 1.3.1 version of Spa
|
|||
<td>-Dhdp.version=2.3.1.0-2574</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>spark.yarn.jar</td>
|
||||
<td>/home/zeppelin/incubator-zeppelin/interpreter/spark/zeppelin-spark-0.6.0-incubating-SNAPSHOT.jar</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
Click on Save button. Once these configurations are updated, Zeppelin will prompt you to restart the interpreter. Accept the prompt and the interpreter will reload the configurations.
|
||||
|
|
|
|||
Loading…
Reference in a new issue