Update install and configuring spark

2026-05-24 09:38:26 +00:00 · 2015-11-19 14:04:43 +09:00 · 2015-11-19 14:04:43 +09:00 · de74e6b7fb
commit de74e6b7fb
parent 98758d1cc5
2 changed files with 56 additions and 46 deletions
--- a/docs/install/install.md
+++ b/docs/install/install.md
@ -21,46 +21,14 @@ limitations under the License.



-## Build
+## From binary package

-#### Prerequisites
+   Download latest binary package from [Download](../download.html).

- * Java 1.7
- * None root account
- * Apache Maven

-Build tested on OSX, CentOS 6.
-
-Checkout source code from [https://github.com/apache/incubator-zeppelin](https://github.com/apache/incubator-zeppelin)
-
-#### Local mode
-
-```
-mvn install -DskipTests
-```
-
-#### Cluster mode
-
-```
-mvn install -DskipTests -Dspark.version=1.1.0 -Dhadoop.version=2.2.0
-```
-
-Change spark.version and hadoop.version to your cluster's one.
-
-#### Custom built Spark
-
-Note that is you uses custom build spark, you need build Zeppelin with custome built spark artifact. To do that, deploy spark artifact to local maven repository using
-
-```
-sbt/sbt publish-local
-```
-
-and then build Zeppelin with your custom built Spark
-
-```
-mvn install -DskipTests -Dspark.version=1.1.0-Custom -Dhadoop.version=2.2.0
-```
+## Build from source

+   Check instructions in [README](https://github.com/apache/incubator-zeppelin/blob/master/README.md) to build from source.



@ -80,7 +48,7 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
    <td>ZEPPELIN_PORT</td>
    <td>zeppelin.server.port</td>
    <td>8080</td>
-    <td>Zeppelin server port. Note that port+1 is used for web socket</td>
+    <td>Zeppelin server port.</td>
  </tr>
  <tr>
    <td>ZEPPELIN_NOTEBOOK_DIR</td>
@ -101,12 +69,6 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
    <td>interpreter</td>
    <td>Zeppelin interpreter directory</td>
  </tr>
-  <tr>
-    <td>MASTER</td>
-    <td></td>
-    <td>N/A</td>
-    <td>Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode</td>
-  </tr>
  <tr>
    <td>ZEPPELIN_JAVA_OPTS</td>
    <td></td>
@ -114,6 +76,12 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
    <td>JVM Options</td>
 </table>

+<br />
+You'll also need to configure individual interpreter. Informations can be cound in 'Interpreter' section in this documentation.
+
+For example [Spark](../interpreter/spark.html).
+
+<br />
 ## Start/Stop
 #### Start Zeppelin

@ -121,7 +89,6 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
 bin/zeppelin-daemon.sh start
 ```
 After successful start, visit http://localhost:8080 with your web browser.
-Note that port **8081** also need to be accessible for websocket connection.

 #### Stop Zeppelin

--- a/docs/interpreter/spark.md
+++ b/docs/interpreter/spark.md
@ -7,7 +7,7 @@ group: manual
 {% include JB/setup %}


-## Spark
+## Spark Interpreter

 [Apache Spark](http://spark.apache.org) is supported in Zeppelin with 
 Spark Interpreter group, which consisted of 4 interpreters.
@ -41,10 +41,52 @@ Spark Interpreter group, which consisted of 4 interpreters.
 </table>


+<br /><br />
+
+### Configuration
+<hr />
+
+Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need following two simple steps.
+
+#### 1. export SPARK_HOME
+
+In **conf/zeppelin-env.sh**, export SPARK_HOME environment variable with your Spark installation path.
+
+for example
+
+```bash
+export SPARK_HOME=/usr/lib/spark
+```
+
+You can optionally export HADOOP\_CONF\_DIR and SPARK\_SUBMIT\_OPTIONS
+
+```bash
+export HADOOP_CONF_DIR=/usr/lib/hadoop
+export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"
+```
+
+
 <br />
+#### 2. set master in Interpreter menu.
+
+After start Zeppelin, go to **Interpreter** menu and edit **master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
+
+for example,
+
+ * *local[*]* in local mode,
+ * *spark://master:7077* in standalone cluster,
+ * *yarn-client* in Yarn client mode
+ * *mesos://host:5050* in Mesos cluster


+
+<br />
+That's it. Zeppelin will work with any version of Spark and any deployment type without rebuild Zeppelin in this way. (Zeppelin 0.5.5-incubating release works up to Spark 1.5.1)
+
+
+<br /> <br />
 ### SparkContext, SQLContext, ZeppelinContext
+<hr />

 SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments.

@ -55,6 +97,7 @@ Note that scala / python environment shares the same SparkContext, SQLContext, Z
 <br />
 <br />
 ### Dependency Management
+<hr />
 There are two ways to load external library in spark interpreter. First is using Zeppelin's %dep interpreter and second is loading Spark properties.

 #### 1. Dynamic Dependency Loading via %dep interpreter
@ -163,7 +206,7 @@ Here are few examples:
 <br />
 <br />
 ### ZeppelinContext
-
+<hr />

 Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility.