doc for spark standalone

This commit is contained in:
astroshim 2016-07-26 00:43:59 +09:00
parent 483a89705b
commit 83fdef6e2b
7 changed files with 138 additions and 2 deletions

View file

@ -32,7 +32,6 @@
<li><a href="{{BASE_PATH}}/manual/notebookashomepage.html">Customize Zeppelin Homepage</a></li>
<li role="separator" class="divider"></li>
<li class="title"><span><b>More</b><span></li>
<li><a href="{{BASE_PATH}}/install/virtual_machine.html">Zeppelin on Vagrant VM</a></li>
<li><a href="{{BASE_PATH}}/install/upgrade.html">Upgrade Zeppelin Version</a></li>
</ul>
</li>
@ -102,6 +101,10 @@
<li><a href="{{BASE_PATH}}/security/notebook_authorization.html">Notebook Authorization</a></li>
<li><a href="{{BASE_PATH}}/security/datasource_authorization.html">Data Source Authorization</a></li>
<li role="separator" class="divider"></li>
<li class="title"><span><b>Advanced</b><span></li>
<li><a href="{{BASE_PATH}}/install/virtual_machine.html">Zeppelin on Vagrant VM</a></li>
<li><a href="{{BASE_PATH}}/install/spark_cluster_mode.html#spark-standalone-mode">Zeppelin on Spark Cluster Mode (Standalone)</a></li>
<li role="separator" class="divider"></li>
<li class="title"><span><b>Contibute</b><span></li>
<li><a href="{{BASE_PATH}}/development/writingzeppelininterpreter.html">Writing Zeppelin Interpreter</a></li>
<li><a href="{{BASE_PATH}}/development/writingzeppelinapplication.html">Writing Zeppelin Application (Experimental)</a></li>

Binary file not shown.

After

Width:  |  Height:  |  Size: 201 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 180 KiB

View file

@ -133,7 +133,6 @@ Join to our [Mailing list](https://zeppelin.apache.org/community.html) and repor
* [Publish your Paragraph](./manual/publish.html) results into your external website
* [Customize Zeppelin Homepage](./manual/notebookashomepage.html) with one of your notebooks
* More
* [Apache Zeppelin on Vagrant VM](./install/virtual_machine.html): a guide for installing Apache Zeppelin on Vagrant virtual machine
* [Upgrade Apache Zeppelin Version](./install/upgrade.html): a manual procedure of upgrading Apache Zeppelin version
####Interpreter
@ -168,6 +167,9 @@ Join to our [Mailing list](https://zeppelin.apache.org/community.html) and repor
* [Shiro Authentication](./security/shiroauthentication.html)
* [Notebook Authorization](./security/notebook_authorization.html)
* [Data Source Authorization](./security/datasource_authorization.html)
* Advanced
* [Apache Zeppelin on Vagrant VM](./install/virtual_machine.html)
* [Zeppelin on Spark Cluster Mode (Standalone)](./install/spark_cluster_mode.html#spark-standalone-mode)
* Contribute
* [Writing Zeppelin Interpreter](./development/writingzeppelininterpreter.html)
* [Writing Zeppelin Application (Experimental)](./development/writingzeppelinapplication.html)

View file

@ -0,0 +1,74 @@
---
layout: page
title: "Apache Zeppelin on Spark cluster mode"
description: ""
group: install
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{% include JB/setup %}
# Apache Zeppelin on Spark Cluster Mode
<div id="toc"></div>
## Overview
[Apache Spark](http://spark.apache.org/) has supported three cluster manager types([Standalone](http://spark.apache.org/docs/latest/spark-standalone.html), [Apache Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html) and [Hadoop YARN](http://spark.apache.org/docs/latest/running-on-yarn.html)) so far.
This document will guide you how you can build and configure the environment on 3 types of Spark cluster manager with Apache Zeppelin using [Docker](https://www.docker.com/) scripts.
So [install docker](https://docs.docker.com/engine/installation/) on the machine first.
## Spark standalone mode
[Spark standalone](http://spark.apache.org/docs/latest/spark-standalone.html) is a simple cluster manager included with Spark that makes it easy to set up a cluster.
You can simply set up Spark standalone environment with below steps.
> **Note :** Since Apache Zeppelin and Spark use same `8080` port for their web UI, you might need to change `zeppelin.server.port` in `conf/zeppelin-site.xml`.
### 1. Build Docker file
You can find docker script files under `scripts/docker/spark-cluster-managers`.
```
cd $ZEPPELIN_HOME/scripts/docker/spark-cluster-managers/spark_standalone
docker build -t "spark_standalone" .
```
### 2. Run docker
```
docker run -it \
-p 8080:8080 \
-p 7077:7077 \
-p 8888:8888 \
-p 8081:8081 \
-h sparkmaster \
--name spark_standalone \
spark_standalone bash;
```
### 3. Configure Spark interpreter in Zeppelin
Set Spark master as `spark://localhost:7077` in Zeppelin **Interpreters** setting page.
<img src="../assets/themes/zeppelin/img/docs-img/standalone_conf.png" />
### 4. Run Zeppelin with Spark interpreter
After running single paragraph with Spark interpreter in Zeppelin, browse `https://localhost:8080` and check whether Spark cluster is running well or not.
<img src="../assets/themes/zeppelin/img/docs-img/spark_ui.png" />
You can also simply verify that Spark is running well in Docker with below command.
```
ps -ef | grep spark
```

View file

@ -0,0 +1,40 @@
FROM centos:centos6
MAINTAINER hsshim@nflabs.com
ENV SPARK_PROFILE 1.6
ENV SPARK_VERSION 1.6.2
ENV HADOOP_PROFILE 2.3
ENV SPARK_HOME /usr/local/spark
# Update the image with the latest packages
RUN yum update -y; yum clean all
# Get utils
RUN yum install -y \
wget \
tar \
curl \
&& \
yum clean all
# Remove old jdk
RUN yum remove java; yum remove jdk
# install jdk7
RUN yum install -y java-1.7.0-openjdk-devel
ENV JAVA_HOME /usr/lib/jvm/java
ENV PATH $PATH:$JAVA_HOME/bin
# install spark
RUN curl -s http://apache.mirror.cdnetworks.com/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_PROFILE.tgz | tar -xz -C /usr/local/
RUN cd /usr/local && ln -s spark-$SPARK_VERSION-bin-hadoop$HADOOP_PROFILE spark
# update boot script
COPY entrypoint.sh /etc/entrypoint.sh
RUN chown root.root /etc/entrypoint.sh
RUN chmod 700 /etc/entrypoint.sh
#spark
EXPOSE 8080 7077 8888 8081
ENTRYPOINT ["/etc/entrypoint.sh"]

View file

@ -0,0 +1,17 @@
#!/bin/bash
export SPARK_MASTER_PORT=7077
# run spark
cd /usr/local/spark/sbin
./start-master.sh
./start-slave.sh spark://`hostname`:$SPARK_MASTER_PORT
CMD=${1:-"exit 0"}
if [[ "$CMD" == "-d" ]];
then
service sshd stop
/usr/sbin/sshd -D -d
else
/bin/bash -c "$*"
fi