zeppelin/docs/setup/deployment/yarn_install.md
1ambda 4b6d3e5574 [ZEPPELIN-2596] Improving documentation page
### What is this PR for?

Improving documentation page. Please check *TODO* and *Screenshots* sections for detail.
The motivation is described in [the JIRA ticket](https://issues.apache.org/jira/browse/ZEPPELIN-2583) and discussion is ongoing on the mailing list.

### What type of PR is it?
[Improvement | Documentation]

### Todos
* [x] - improved the navbar style
* [x] - improved the main page
* [x] - re-organized content structure
* [x] - added tutorial pages: `spark_with_zeppelin.md`, `python_with_zeppelin.md`, `sql_with_zeppelin.md` for overview
* [x] - added `multi_user_support.md` page to provide overview
* [x] - added the empty `interpreter_binding_mode` page. This will be handed in the different issue: [ZEPPELIN-2582](https://issues.apache.org/jira/browse/ZEPPELIN-2582)
* [x] - added the empty `trouble_shooting` page. This can be filled in the following PRs.
* [x] - added the empty `useful_developer_tools` page. This can be filled in the following PRs.

### What is the Jira issue?

[ZEPPELIN-2596](https://issues.apache.org/jira/browse/ZEPPELIN-2596)

### How should this be tested?

1. checkout
2. `cd docs`
3. `bundle install` (make sure that you have ruby 2.1.0+ and bundle)
4. `bundle exec jekyll serve --watch`
5. open `localhost:4000`

### Screenshots (if appropriate)

#### better navbar: before
![2596_before_nav](https://cloud.githubusercontent.com/assets/4968473/26542353/89004e7a-4494-11e7-89c0-28d608f5f375.gif)

#### better navbar: after

![2596_after_nav](https://cloud.githubusercontent.com/assets/4968473/26542356/8bfb7b90-4494-11e7-9979-0bcaef8ba97b.gif)

#### improved main page: before

![2596_before_main](https://cloud.githubusercontent.com/assets/4968473/26542358/8f35b0be-4494-11e7-8a6c-e74ec52fc384.gif)

#### improved main page: after

![2596_after_main](https://cloud.githubusercontent.com/assets/4968473/26542366/93b333c8-4494-11e7-981f-3f7b4545868f.gif)

#### organized content structure: before

![2596_before_content](https://cloud.githubusercontent.com/assets/4968473/26542398/ad81ac26-4494-11e7-9a17-70dff41396fb.gif)

#### organized content structure: after

![2596_after_content](https://cloud.githubusercontent.com/assets/4968473/26542403/b0a42ad2-4494-11e7-8bd3-8a5bd194c6af.gif)

### Questions:
* Does the licenses files need update? - NO
* Is there breaking changes for older versions? - NO
* Does this needs documentation? -  related with docs

Author: 1ambda <1amb4a@gmail.com>

Closes #2371 from 1ambda/updating-version-doc and squashes the following commits:

eb02fa967 [1ambda] fix: navbar focus color applies after folding
026379ed6 [1ambda] fix: Remove docs/.listen_test
a7dd4737b [1ambda] fix: sora's comment 1.2
18c5058f7 [1ambda] fix: resolve description in python_with_zeppelin.md
d3ad67c73 [1ambda] fix: sora's comment 4
d133dbbcc [1ambda] fix: resolve sora's comment 3
513c6ff2c [1ambda] fix: resolve sora's comment 1.1
4c2946928 [1ambda] fix: resovle sora's comment 2
1c3946ac6 [1ambda] fix: sora's comment 1
4d6e4267f [1ambda] fix: Resolve sola's comment 3
d0524cafe [1ambda] fix: Set less shadow for nav
5f1f998ba [1ambda] docs: Add useful_develop_tools.md
9dfd62c74 [1ambda] fix: Typo in installation.md
30f7d7e06 [1ambda] fix: Typo in helium ctrl
d6877e792 [1ambda] docs: Add python_with_zeppelin.md
7027e96c0 [1ambda] docs: Improve python conda, docker doc style
e55b50a9d [1ambda] fix: Invalid URLs
75ddeeaff [1ambda] docs: replace URIs in interpreter
5b43993a4 [1ambda] docs: Add sql_with_zeppelin
053794e84 [1ambda] docs: Add spark_with_zeppelin.md
d4d88b9c7 [1ambda] docs: Improve proxy doc
b46cdd126 [1ambda] docs: Add empty interpreter_binding_mode.md
06fcb239e [1ambda] docs: Add empty personalized_mode.md
4991cf0a7 [1ambda] docs: Update upgrading.md
53142b7a0 [1ambda] fix: Simplify install.md
8a5c1e721 [1ambda] docs: Add multi_user_support.md
34095775e [1ambda] fix: Increase font size to 15px
a03b04b33 [1ambda] fix: Remove sample text from trouble_shooting.md
199842590 [1ambda] fix: Remove docker doc link
66a2a7d26 [1ambda] docs: Improve impersonation page
0a6e3fc1d [1ambda] docs: Improve install doc
ccd999ed5 [1ambda] docs: Improve helium doc
f8d742d08 [1ambda] fix: an invalid link in navbar
b7aa5f884 [1ambda] fix: URLs in development
61a175d94 [1ambda] docs: Update install.md
4c56de5c4 [1ambda] fix: URLs in setup
0b1d63513 [1ambda] fix: URLs in quickstart
28970a4fe [1ambda] feat: Add docs/usage
735946bca [1ambda] feat: rename /quickstart
b351cf237 [1ambda] fix: Add missing links
b70770b4f [1ambda] feat: Change URLs in nav, index
94e80aef6 [1ambda] fix: doens't display navbar version in small
6e0cab110 [1ambda] feat: Update doc section names
b9ce256ff [1ambda] feat: Hide version in navbar when md
f8bab52be [1ambda] fix: Better image display in index.md
eeb37d5b5 [1ambda] fix: Add RL padding for mobile browser
ceb60b5ee [1ambda] feat: Style collapsed nav for mobile browser
4ebafb4b6 [1ambda] commit
2017-06-23 17:44:13 +09:00

7.5 KiB

layout title description group
page Install Zeppelin to connect with existing YARN cluster This page describes how to pre-configure a bare metal node, configure Apache Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of Hadoop. setup/deployment

{% include JB/setup %}

Introduction

This page describes how to pre-configure a bare metal node, configure Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of Hadoop. It also describes steps to configure Spark interpreter of Zeppelin.

Prepare Node

Zeppelin user (Optional)

This step is optional, however its nice to run Zeppelin under its own user. In case you do not like to use Zeppelin (hope not) the user could be deleted along with all the packages that were installed for Zeppelin, Zeppelin binary itself and associated directories.

Create a zeppelin user and switch to zeppelin user or if zeppelin user is already created then login as zeppelin.

useradd zeppelin
su - zeppelin
whoami

Assuming a zeppelin user is created then running whoami command must return

zeppelin

Its assumed in the rest of the document that zeppelin user is indeed created and below installation instructions are performed as zeppelin user.

List of Prerequisites

  • CentOS 6.x, Mac OSX, Ubuntu 14.X
  • Java 1.7
  • Hadoop client
  • Spark
  • Internet connection is required.

It's assumed that the node has CentOS 6.x installed on it. Although any version of Linux distribution should work fine.

Hadoop client

Zeppelin can work with multiple versions & distributions of Hadoop. A complete list is available here. This document assumes Hadoop 2.7.x client libraries including configuration files are installed on Zeppelin node. It also assumes /etc/hadoop/conf contains various Hadoop configuration files. The location of Hadoop configuration files may vary, hence use appropriate location.

hadoop version
Hadoop 2.7.1.2.3.1.0-2574
Subversion git@github.com:hortonworks/hadoop.git -r f66cf95e2e9367a74b0ec88b2df33458b6cff2d0
Compiled by jenkins on 2015-07-25T22:36Z
Compiled with protoc 2.5.0
From source with checksum 54f9bbb4492f92975e84e390599b881d
This command was run using /usr/hdp/2.3.1.0-2574/hadoop/lib/hadoop-common-2.7.1.2.3.1.0-2574.jar

Spark

Spark is supported out of the box and to take advantage of this, you need to Download appropriate version of Spark binary packages from Spark Download page and unzip it. Zeppelin can work with multiple versions of Spark. A complete list is available here. This document assumes Spark 1.6.0 is installed at /usr/lib/spark.

Note: Spark should be installed on the same node as Zeppelin.

Note: Spark's pre-built package for CDH 4 doesn't support yarn.

Zeppelin

Checkout source code from git://git.apache.org/zeppelin.git or download binary package from Download page. You can refer Install page for the details. This document assumes that Zeppelin is located under /home/zeppelin/zeppelin.

Zeppelin Configuration

Zeppelin configuration needs to be modified to connect to YARN cluster. Create a copy of zeppelin environment shell script.

cp /home/zeppelin/zeppelin/conf/zeppelin-env.sh.template /home/zeppelin/zeppelin/conf/zeppelin-env.sh

Set the following properties

export JAVA_HOME="/usr/java/jdk1.7.0_79"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.1.0-2574"
export SPARK_HOME="/usr/lib/spark"

As /etc/hadoop/conf contains various configurations of YARN cluster, Zeppelin can now submit Spark/Hive jobs on YARN cluster form its web interface. The value of hdp.version is set to 2.3.1.0-2574. This can be obtained by running the following command

hdp-select status hadoop-client | sed 's/hadoop-client - \(.*\)/\1/'
# It returned  2.3.1.0-2574

Start/Stop

Start Zeppelin

cd /home/zeppelin/zeppelin
bin/zeppelin-daemon.sh start

After successful start, visit http://[zeppelin-server-host-name]:8080 with your web browser.

Stop Zeppelin

bin/zeppelin-daemon.sh stop

Interpreter

Zeppelin provides various distributed processing frameworks to process data that ranges from Spark, JDBC, Ignite and Lens to name a few. This document describes to configure JDBC & Spark interpreters.

Hive

Zeppelin supports Hive through JDBC interpreter. You might need the information to use Hive and can find in your hive-site.xml

Once Zeppelin server has started successfully, visit http://[zeppelin-server-host-name]:8080 with your web browser. Click on Interpreter tab next to Notebook dropdown. Look for Hive configurations and set them appropriately. Set them as per Hive installation on YARN cluster. Click on Save button. Once these configurations are updated, Zeppelin will prompt you to restart the interpreter. Accept the prompt and the interpreter will reload the configurations.

Spark

It was assumed that 1.6.0 version of Spark is installed at /usr/lib/spark. Look for Spark configurations and click edit button to add the following properties

Property Name Property Value Remarks
master yarn-client In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
spark.driver.extraJavaOptions -Dhdp.version=2.3.1.0-2574
spark.yarn.am.extraJavaOptions -Dhdp.version=2.3.1.0-2574

Click on Save button. Once these configurations are updated, Zeppelin will prompt you to restart the interpreter. Accept the prompt and the interpreter will reload the configurations.

Spark & Hive notebooks can be written with Zeppelin now. The resulting Spark & Hive jobs will run on configured YARN cluster.

Debug

Zeppelin does not emit any kind of error messages on web interface when notebook/paragraph is run. If a paragraph fails it only displays ERROR. The reason for failure needs to be looked into log files which is present in logs directory under zeppelin installation base directory. Zeppelin creates a log file for each kind of interpreter.

[zeppelin@zeppelin-3529 logs]$ pwd
/home/zeppelin/zeppelin/logs
[zeppelin@zeppelin-3529 logs]$ ls -l
total 844
-rw-rw-r-- 1 zeppelin zeppelin  14648 Aug  3 14:45 zeppelin-interpreter-hive-zeppelin-zeppelin-3529.log
-rw-rw-r-- 1 zeppelin zeppelin 625050 Aug  3 16:05 zeppelin-interpreter-spark-zeppelin-zeppelin-3529.log
-rw-rw-r-- 1 zeppelin zeppelin 200394 Aug  3 21:15 zeppelin-zeppelin-zeppelin-3529.log
-rw-rw-r-- 1 zeppelin zeppelin  16162 Aug  3 14:03 zeppelin-zeppelin-zeppelin-3529.out
[zeppelin@zeppelin-3529 logs]$