Update install and configuring spark

This commit is contained in:
Lee moon soo 2015-11-19 14:04:43 +09:00
parent 98758d1cc5
commit de74e6b7fb
2 changed files with 56 additions and 46 deletions

View file

@ -21,46 +21,14 @@ limitations under the License.
## Build
## From binary package
#### Prerequisites
Download latest binary package from [Download](../download.html).
* Java 1.7
* None root account
* Apache Maven
Build tested on OSX, CentOS 6.
Checkout source code from [https://github.com/apache/incubator-zeppelin](https://github.com/apache/incubator-zeppelin)
#### Local mode
```
mvn install -DskipTests
```
#### Cluster mode
```
mvn install -DskipTests -Dspark.version=1.1.0 -Dhadoop.version=2.2.0
```
Change spark.version and hadoop.version to your cluster's one.
#### Custom built Spark
Note that is you uses custom build spark, you need build Zeppelin with custome built spark artifact. To do that, deploy spark artifact to local maven repository using
```
sbt/sbt publish-local
```
and then build Zeppelin with your custom built Spark
```
mvn install -DskipTests -Dspark.version=1.1.0-Custom -Dhadoop.version=2.2.0
```
## Build from source
Check instructions in [README](https://github.com/apache/incubator-zeppelin/blob/master/README.md) to build from source.
@ -80,7 +48,7 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
<td>ZEPPELIN_PORT</td>
<td>zeppelin.server.port</td>
<td>8080</td>
<td>Zeppelin server port. Note that port+1 is used for web socket</td>
<td>Zeppelin server port.</td>
</tr>
<tr>
<td>ZEPPELIN_NOTEBOOK_DIR</td>
@ -101,12 +69,6 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
<td>interpreter</td>
<td>Zeppelin interpreter directory</td>
</tr>
<tr>
<td>MASTER</td>
<td></td>
<td>N/A</td>
<td>Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode</td>
</tr>
<tr>
<td>ZEPPELIN_JAVA_OPTS</td>
<td></td>
@ -114,6 +76,12 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
<td>JVM Options</td>
</table>
<br />
You'll also need to configure individual interpreter. Informations can be cound in 'Interpreter' section in this documentation.
For example [Spark](../interpreter/spark.html).
<br />
## Start/Stop
#### Start Zeppelin
@ -121,7 +89,6 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
bin/zeppelin-daemon.sh start
```
After successful start, visit http://localhost:8080 with your web browser.
Note that port **8081** also need to be accessible for websocket connection.
#### Stop Zeppelin

View file

@ -7,7 +7,7 @@ group: manual
{% include JB/setup %}
## Spark
## Spark Interpreter
[Apache Spark](http://spark.apache.org) is supported in Zeppelin with
Spark Interpreter group, which consisted of 4 interpreters.
@ -41,10 +41,52 @@ Spark Interpreter group, which consisted of 4 interpreters.
</table>
<br /><br />
### Configuration
<hr />
Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need following two simple steps.
#### 1. export SPARK_HOME
In **conf/zeppelin-env.sh**, export SPARK_HOME environment variable with your Spark installation path.
for example
```bash
export SPARK_HOME=/usr/lib/spark
```
You can optionally export HADOOP\_CONF\_DIR and SPARK\_SUBMIT\_OPTIONS
```bash
export HADOOP_CONF_DIR=/usr/lib/hadoop
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"
```
<br />
#### 2. set master in Interpreter menu.
After start Zeppelin, go to **Interpreter** menu and edit **master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
for example,
* *local[*]* in local mode,
* *spark://master:7077* in standalone cluster,
* *yarn-client* in Yarn client mode
* *mesos://host:5050* in Mesos cluster
<br />
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuild Zeppelin in this way. (Zeppelin 0.5.5-incubating release works up to Spark 1.5.1)
<br /> <br />
### SparkContext, SQLContext, ZeppelinContext
<hr />
SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments.
@ -55,6 +97,7 @@ Note that scala / python environment shares the same SparkContext, SQLContext, Z
<br />
<br />
### Dependency Management
<hr />
There are two ways to load external library in spark interpreter. First is using Zeppelin's %dep interpreter and second is loading Spark properties.
#### 1. Dynamic Dependency Loading via %dep interpreter
@ -163,7 +206,7 @@ Here are few examples:
<br />
<br />
### ZeppelinContext
<hr />
Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility.