Add documentation

This commit is contained in:
Mina Lee 2016-01-27 15:45:58 -08:00
parent 6b90c3d10c
commit 320f4003bc
8 changed files with 234 additions and 48 deletions

View file

@ -37,7 +37,6 @@
<a href="#" data-toggle="dropdown" class="dropdown-toggle">Interpreter <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="{{BASE_PATH}}/manual/interpreters.html">Overview</a></li>
<li><a href="{{BASE_PATH}}/manual/dynamicinterpreterload.html">Dynamic Interpreter Loading</a></li>
<li role="separator" class="divider"></li>
<li><a href="{{BASE_PATH}}/interpreter/cassandra.html">Cassandra</a></li>
<li><a href="{{BASE_PATH}}/interpreter/elasticsearch.html">Elasticsearch</a></li>
@ -52,6 +51,9 @@
<li><a href="{{BASE_PATH}}/pleasecontribute.html">Shell</a></li>
<li><a href="{{BASE_PATH}}/interpreter/spark.html">Spark</a></li>
<li><a href="{{BASE_PATH}}/pleasecontribute.html">Tajo</a></li>
<li role="separator" class="divider"></li>
<li><a href="{{BASE_PATH}}/manual/dynamicinterpreterload.html">Dynamic Interpreter Loading</a></li>
<li><a href="{{BASE_PATH}}/manual/dependencymanagement.html">Interpreter Dependency Management</a></li>
</ul>
</li>
<li>

View file

@ -11,9 +11,11 @@
<!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet">
<!-- Le styles -->
<link href="{{ ASSET_PATH }}/bootstrap/css/bootstrap.css" rel="stylesheet">
<link href="{{ ASSET_PATH }}/css/style.css?body=1" rel="stylesheet" type="text/css">

Binary file not shown.

After

Width:  |  Height:  |  Size: 298 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 328 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 369 KiB

View file

@ -80,9 +80,58 @@ SparkContext, SQLContext, ZeppelinContext are automatically created and exposed
<a name="dependencyloading"> </a>
## Dependency Management
There are two ways to load external library in spark interpreter. First is using Zeppelin's `%dep` interpreter and second is loading Spark properties.
There are two ways to load external library in spark interpreter. First is using Interpreter setting menu and second is loading Spark properties.
### 1. Setting Dependencies via Interpreter Setting
Please see [Dependency Management](../manual/dependencymanagement.html) for the details.
### 2. Loading Spark Properties
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are:
<table class="table-configuration">
<tr>
<th>spark-defaults.conf</th>
<th>SPARK_SUBMIT_OPTIONS</th>
<th>Applicable Interpreter</th>
<th>Description</th>
</tr>
<tr>
<td>spark.jars</td>
<td>--jars</td>
<td>%spark</td>
<td>Comma-separated list of local jars to include on the driver and executor classpaths.</td>
</tr>
<tr>
<td>spark.jars.packages</td>
<td>--packages</td>
<td>%spark</td>
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.</td>
</tr>
<tr>
<td>spark.files</td>
<td>--files</td>
<td>%pyspark</td>
<td>Comma-separated list of files to be placed in the working directory of each executor.</td>
</tr>
</table>
> Note that adding jar to pyspark is only availabe via `%dep` interpreter at the moment.
Here are few examples:
* SPARK\_SUBMIT\_OPTIONS in conf/zeppelin-env.sh
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar --files /path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg"
* SPARK_HOME/conf/spark-defaults.conf
spark.jars /path/mylib1.jar,/path/mylib2.jar
spark.jars.packages com.databricks:spark-csv_2.10:1.2.0
spark.files /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip
### 3. Dynamic Dependency Loading via %dep interpreter
> Note: `%dep` interpreter is deprecated since v0.6.0-incubating.
`%dep` interpreter load libraries to `%spark` and `%pyspark` but not to `%spark.sql` interpreter so we recommend you to use first option instead.
### 1. Dynamic Dependency Loading via %dep interpreter
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using `%dep` interpreter.
* Load libraries recursively from Maven repository
@ -129,49 +178,6 @@ z.load("groupId:artifactId:version").exclude("groupId:*")
z.load("groupId:artifactId:version").local()
```
### 2. Loading Spark Properties
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are:
<table class="table-configuration">
<tr>
<th>spark-defaults.conf</th>
<th>SPARK_SUBMIT_OPTIONS</th>
<th>Applicable Interpreter</th>
<th>Description</th>
</tr>
<tr>
<td>spark.jars</td>
<td>--jars</td>
<td>%spark</td>
<td>Comma-separated list of local jars to include on the driver and executor classpaths.</td>
</tr>
<tr>
<td>spark.jars.packages</td>
<td>--packages</td>
<td>%spark</td>
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.</td>
</tr>
<tr>
<td>spark.files</td>
<td>--files</td>
<td>%pyspark</td>
<td>Comma-separated list of files to be placed in the working directory of each executor.</td>
</tr>
</table>
> Note that adding jar to pyspark is only availabe via `%dep` interpreter at the moment.
Here are few examples:
* SPARK\_SUBMIT\_OPTIONS in conf/zeppelin-env.sh
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar --files /path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg"
* SPARK_HOME/conf/spark-defaults.conf
spark.jars /path/mylib1.jar,/path/mylib2.jar
spark.jars.packages com.databricks:spark-csv_2.10:1.2.0
spark.files /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip
## ZeppelinContext
Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility.

View file

@ -0,0 +1,74 @@
---
layout: page
title: "Dependnecy Management"
description: ""
group: manual
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{% include JB/setup %}
## Dependency Management for Interpreter
You can include external libraries to interpreter by setting dependencies in interpreter menu.
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs in this menu.
* Load libraries recursively from Maven repository
* Load libraries from local filesystem
* Add additional maven repository
* Automatically add libraries to SparkCluster
<hr>
<div class="row">
<div class="col-md-6">
<a data-lightbox="compiler" href="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/interpreter-dependency-loading.png">
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/interpreter-dependency-loading.png" />
</a>
</div>
<div class="col-md-6" style="padding-top:30px">
<b> Load Dependencies to Interpreter </b>
<br /><br />
<ol>
<li> Click 'Interpreter' menu in navigation bar. </li>
<li> Click 'edit' button of the interpreter which you want to load dependencies to. </li>
<li> Fill artifact and exclude field to your needs.
You can enter not only groupId:artifactId:version but also local file in artifact field. </li>
<li> Press 'Save' to restart the interpreter with loaded libraries. </li>
</ol>
</div>
</div>
<hr>
<div class="row">
<div class="col-md-6">
<a data-lightbox="compiler" href="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/interpreter-add-repo1.png">
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/interpreter-add-repo1.png" />
</a>
<a data-lightbox="compiler" href="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/interpreter-add-repo2.png">
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/interpreter-add-repo2.png" />
</a>
</div>
<div class="col-md-6" style="padding-top:30px">
<b> Add repository for dependency resolving </b>
<br /><br />
<ol>
<li> Press <i class="fa fa-cog"></i> icon in 'Interpreter' menu on the top right side.
It will show you available repository lists.</li>
<li> If you need to resolve dependencies from other than central maven repository or
local ~/.m2 repository, hit <i class="fa fa-plus"></i> icon next to repository lists. </li>
<li> Fill out the form and click 'Add' button, then you will be able to see that new repository is added. </li>
</ol>
</div>
</div>

View file

@ -151,7 +151,8 @@ limitations under the License.
"class": "org.apache.zeppelin.markdown.Markdown",
"name": "md"
}
]
],
"dependencies": []
},
{
"id": "2AY6GV7Q3",
@ -170,6 +171,11 @@ limitations under the License.
"class": "org.apache.zeppelin.spark.SparkSqlInterpreter",
"name": "sql"
}
],
"dependencies": [
{
"groupArtifactVersion": "com.databricks:spark-csv_2.10:1.3.0"
}
]
}
]
@ -219,6 +225,12 @@ limitations under the License.
"class": "org.apache.zeppelin.markdown.Markdown",
"name": "md"
}
],
"dependencies": [
{
"groupArtifactVersion": "groupId:artifactId:version",
"exclusions": "groupId:artifactId"
}
]
}
</pre>
@ -243,6 +255,12 @@ limitations under the License.
"class": "org.apache.zeppelin.markdown.Markdown",
"name": "md"
}
],
"dependencies": [
{
"groupArtifactVersion": "groupId:artifactId:version",
"exclusions": "groupId:artifactId"
}
]
}
}
@ -292,6 +310,12 @@ limitations under the License.
"class": "org.apache.zeppelin.markdown.Markdown",
"name": "md"
}
],
"dependencies": [
{
"groupArtifactVersion": "groupId:artifactId:version",
"exclusions": "groupId:artifactId"
}
]
}
</pre>
@ -316,6 +340,12 @@ limitations under the License.
"class": "org.apache.zeppelin.markdown.Markdown",
"name": "md"
}
],
"dependencies": [
{
"groupArtifactVersion": "groupId:artifactId:version",
"exclusions": "groupId:artifactId"
}
]
}
}
@ -391,3 +421,75 @@ limitations under the License.
</td>
</tr>
</table>
<br/>
### 6. Add repository for dependency resolving
<table class="table-configuration">
<col width="200">
<tr>
<th>Add new repository for dependency loader</th>
<th></th>
</tr>
<tr>
<td>Description</td>
<td>This ```POST``` method adds new repository.</td>
</tr>
<tr>
<td>URL</td>
<td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/repository```</td>
</tr>
<tr>
<td>Success code</td>
<td>201</td>
</tr>
<tr>
<td>Fail code</td>
<td> 500 </td>
</tr>
<tr>
<td>Sample JSON input</td>
<td>
<pre>
{
"id": "securecentral",
"url": "https://repo1.maven.org/maven2",
"snapshot": false
}
</pre>
</td>
</tr>
<tr>
<td>Sample JSON response</td>
<td>
<code>{"status":"OK"}</code>
</td>
</tr>
</table>
<br/>
### 7. Delete repository for dependency resolving
<table class="table-configuration">
<col width="200">
<tr>
<th>Delete repository for dependency loader</th>
<th></th>
</tr>
<tr>
<td>Description</td>
<td>This ```DELETE``` method delete repository with given id.</td>
</tr>
<tr>
<td>URL</td>
<td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/repository/[repository ID]```</td>
</tr>
<tr>
<td>Success code</td>
<td>200</td>
</tr>
<tr>
<td>Fail code</td>
<td> 500 </td>
</tr>
</table>