Merge remote-tracking branch 'origin/master'

# Conflicts:
#	docs/interpreter/cassandra.md
This commit is contained in:
Jesang Yoon 2016-01-18 03:45:48 +09:00
commit af55811b54
14 changed files with 707 additions and 711 deletions

View file

@ -32,10 +32,8 @@ All Interpreters in the same interpreter group are launched in a single, separat
### Make your own Interpreter
Creating a new interpreter is quite simple. Just extend [org.apache.zeppelin.interpreter](https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/Interpreter.java) abstract class and implement some methods.
You can include org.apache.zeppelin:zeppelin-interpreter:[VERSION] artifact in your build system.
Your interpreter name is derived from the static register method
You can include `org.apache.zeppelin:zeppelin-interpreter:[VERSION]` artifact in your build system.
Your interpreter name is derived from the static register method.
```
static {
@ -44,16 +42,15 @@ static {
```
The name will appear later in the interpreter name option box during the interpreter configuration process.
The name of the interpreter is what you later write to identify a paragraph which should be interpreted using this interpreter.
```
%MyInterpreterName
some interpreter spesific code...
some interpreter specific code...
```
### Install your interpreter binary
Once you have build your interpreter, you can place your interpreter under directory with all the dependencies.
Once you have built your interpreter, you can place it under the interpreter directory with all its dependencies.
```
[ZEPPELIN_HOME]/interpreter/[INTERPRETER_NAME]/
@ -63,33 +60,34 @@ Once you have build your interpreter, you can place your interpreter under direc
To configure your interpreter you need to follow these steps:
1. create conf/zeppelin-site.xml by copying conf/zeppelin-site.xml.template to conf/zeppelin-site.xml
1. Add your interpreter class name to the zeppelin.interpreters property in `conf/zeppelin-site.xml`.
2. Add your interpreter class name to the zeppelin.interpreters property in conf/zeppelin-site.xml
Property value is comma separated [INTERPRETER_CLASS_NAME]
for example,
Property value is comma separated [INTERPRETER\_CLASS\_NAME].
For example,
```
```
<property>
<name>zeppelin.interpreters</name>
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,com.me.MyNewInterpreter</value>
</property>
```
3. start zeppelin by running ```./bin/zeppelin-deamon start```
4. in the interpreter page, click the +Create button and configure your interpreter properties.
2. Add your interpreter to the [default configuration](https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java#L397) which is used when there is no `zeppelin-site.xml`.
3. Start Zeppelin by running `./bin/zeppelin-daemon.sh start`.
4. In the interpreter page, click the `+Create` button and configure your interpreter properties.
Now you are done and ready to use your interpreter.
Note that the interpreters shipped with zeppelin have a [default configuration](https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java#L397) which is used when there is no zeppelin-site.xml.
Note that the interpreters shipped with zeppelin have a [default configuration](https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java#L397) which is used when there is no `conf/zeppelin-site.xml`.
### Use your interpreter
#### 0.5.0
Inside of a notebook, %[INTERPRETER_NAME] directive will call your interpreter.
Inside of a notebook, `%[INTERPRETER_NAME]` directive will call your interpreter.
Note that the first interpreter configuration in zeppelin.interpreters will be the default one.
for example
For example,
```
%myintp
@ -100,16 +98,14 @@ println(a)
<br />
#### 0.6.0 and later
Inside of a notebook, %[INTERPRETER\_GROUP].[INTERPRETER\_NAME] directive will call your interpreter.
Inside of a notebook, `%[INTERPRETER_GROUP].[INTERPRETER_NAME]` directive will call your interpreter.
Note that the first interpreter configuration in zeppelin.interpreters will be the default one.
You can omit either [INTERPRETER\_GROUP] or [INTERPRETER\_NAME]. Omit [INTERPRETER\_NAME] selects first available interpreter in the [INTERPRETER\_GROUP].
Omit '[INTERPRETER\_GROUP]' will selects [INTERPRETER\_NAME] from default interpreter group.
You can omit either [INTERPRETER\_GROUP] or [INTERPRETER\_NAME]. If you omit [INTERPRETER\_NAME], then first available interpreter will be selected in the [INTERPRETER\_GROUP].
Likewise, if you skip [INTERPRETER\_GROUP], then [INTERPRETER\_NAME] will be chosen from default interpreter group.
For example, if you have two interpreter myintp1 and myintp2 in group mygrp,
you can call myintp1 like
For example, if you have two interpreter myintp1 and myintp2 in group mygrp, you can call myintp1 like
```
%mygrp.myintp1
@ -125,7 +121,7 @@ and you can call myintp2 like
codes for myintp2
```
If you omit your interpreter name, it'll selects first available interpreter in the group (myintp1)
If you omit your interpreter name, it'll select first available interpreter in the group ( myintp1 ).
```
%mygrp

File diff suppressed because it is too large Load diff

View file

@ -8,8 +8,9 @@ group: manual
## Elasticsearch Interpreter for Apache Zeppelin
[Elasticsearch](https://www.elastic.co/products/elasticsearch) is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements.
### Configuration
## Configuration
<table class="table-configuration">
<tr>
@ -30,7 +31,7 @@ group: manual
<tr>
<td>elasticsearch.port</td>
<td>9300</td>
<td>Connection port <b>(important: this is not the HTTP port, but the transport port)</b></td>
<td>Connection port <b>( Important: this is not the HTTP port, but the transport port )</b></td>
</tr>
<tr>
<td>elasticsearch.result.size</td>
@ -44,16 +45,14 @@ group: manual
</center>
> Note #1: you can add more properties to configure the Elasticsearch client.
> **Note #1 :** You can add more properties to configure the Elasticsearch client.
> Note #2: if you use Shield, you can add a property named `shield.user` with a value containing the name and the password (format: `username:password`). For more details about Shield configuration, consult the [Shield reference guide](https://www.elastic.co/guide/en/shield/current/_using_elasticsearch_java_clients_with_shield.html). Do not forget, to copy the shield client jar in the interpreter directory (`ZEPPELIN_HOME/interpreters/elasticsearch`).
### Enabling the Elasticsearch Interpreter
> **Note #2 :** If you use Shield, you can add a property named `shield.user` with a value containing the name and the password ( format: `username:password` ). For more details about Shield configuration, consult the [Shield reference guide](https://www.elastic.co/guide/en/shield/current/_using_elasticsearch_java_clients_with_shield.html). Do not forget, to copy the shield client jar in the interpreter directory (`ZEPPELIN_HOME/interpreters/elasticsearch`).
## Enabling the Elasticsearch Interpreter
In a notebook, to enable the **Elasticsearch** interpreter, click the **Gear** icon and select **Elasticsearch**.
### Using the Elasticsearch Interpreter
## Using the Elasticsearch Interpreter
In a paragraph, use `%elasticsearch` to select the Elasticsearch interpreter and then input all commands. To get the list of available commands, use `help`.
```bash
@ -74,13 +73,13 @@ Commands:
. same comments as for the search
- get /index/type/id
- delete /index/type/id
- index /ndex/type/id <json-formatted document>
- index /index/type/id <json-formatted document>
. the id can be omitted, elasticsearch will generate one
```
> Tip: use (CTRL + .) for completion
> **Tip :** Use ( Ctrl + . ) for autocompletion.
#### get
### Get
With the `get` command, you can find a document by id. The result is a JSON document.
```bash
@ -91,12 +90,12 @@ With the `get` command, you can find a document by id. The result is a JSON docu
Example:
![Elasticsearch - Get](../assets/themes/zeppelin/img/docs-img/elasticsearch-get.png)
#### search
### Search
With the `search` command, you can send a search query to Elasticsearch. There are two formats of query:
* You can provide a JSON-formatted query, that is exactly what you provide when you use the REST API of Elasticsearch.
* See [Elasticsearch search API reference document](https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html) for more details about the content of the search queries.
* You can also provide the content of a `query_string`
* You can also provide the content of a `query_string`.
* This is a shortcut to a query like that: `{ "query": { "query_string": { "query": "__HERE YOUR QUERY__", "analyze_wildcard": true } } }`
* See [Elasticsearch query string syntax](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax) for more details about the content of such a query.
@ -121,7 +120,7 @@ Examples:
```bash
| %elasticsearch
| search / { "query": { "match_all": {} } }
| search / { "query": { "match_all": { } } }
|
| %elasticsearch
| search /logs { "query": { "query_string": { "query": "request.method:GET AND status:200" } } }
@ -133,7 +132,7 @@ Examples:
| "field": "content_length"
| }
| }
| } }
| } }
```
* With query_string elements:
@ -146,7 +145,7 @@ Examples:
| search /logs (404 AND (POST OR DELETE))
```
> **Important**: a document in Elasticsearch is a JSON document, so it is hierarchical, not flat as a row in a SQL table.
> **Important** : a document in Elasticsearch is a JSON document, so it is hierarchical, not flat as a row in a SQL table.
For the Elastic interpreter, the result of a search query is flattened.
Suppose we have a JSON document:
@ -190,7 +189,7 @@ Examples:
* With a query containing a multi-bucket aggregation:
![Elasticsearch - Search with aggregation (multi-bucket)](../assets/themes/zeppelin/img/docs-img/elasticsearch-agg-multi-bucket-pie.png)
#### count
### Count
With the `count` command, you can count documents available in some indices and types. You can also provide a query.
```bash
@ -206,7 +205,7 @@ Examples:
* With a query:
![Elasticsearch - Count with query](../assets/themes/zeppelin/img/docs-img/elasticsearch-count-with-query.png)
#### index
### Index
With the `index` command, you can insert/update a document in Elasticsearch.
```bash
@ -217,7 +216,7 @@ With the `index` command, you can insert/update a document in Elasticsearch.
| index /index/type <JSON document>
```
#### delete
### Delete
With the `delete` command, you can delete a document.
```bash
@ -225,13 +224,12 @@ With the `delete` command, you can delete a document.
| delete /index/type/id
```
#### Apply Zeppelin Dynamic Forms
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features
### Apply Zeppelin Dynamic Forms
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
```bash
| %elasticsearch
| size ${limit=10}
| search /index/type { "query": { "match_all": {} } }
| search /index/type { "query": { "match_all": { } } }
```

View file

@ -8,15 +8,12 @@ group: manual
## Flink interpreter for Apache Zeppelin
[Apache Flink](https://flink.apache.org) is an open source platform for distributed stream and batch data processing. Flinks core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization.
[Apache Flink](https://flink.apache.org) is an open source platform for distributed stream and batch data processing.
### How to start local Flink cluster, to test the interpreter
## How to start local Flink cluster, to test the interpreter
Zeppelin comes with pre-configured flink-local interpreter, which starts Flink in a local mode on your machine, so you do not need to install anything.
### How to configure interpreter to point to Flink cluster
## How to configure interpreter to point to Flink cluster
At the "Interpreters" menu, you have to create a new Flink interpreter and provide next properties:
<table class="table-configuration">
@ -35,23 +32,19 @@ At the "Interpreters" menu, you have to create a new Flink interpreter and provi
<td>6123</td>
<td>port of running JobManager</td>
</tr>
<tr>
<td>xxx</td>
<td>yyy</td>
<td>anything else from [Flink Configuration](https://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/config.html)</td>
</tr>
</table>
### How to test it's working
For more information about Flink configuration, you can find it [here](https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html).
In example, by using the [Zeppelin notebook](https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL05GTGFicy96ZXBwZWxpbi1ub3RlYm9va3MvbWFzdGVyL25vdGVib29rcy8yQVFFREs1UEMvbm90ZS5qc29u) is from [Till Rohrmann's presentation](http://www.slideshare.net/tillrohrmann/data-analysis-49806564) "Interactive data analysis with Apache Flink" for Apache Flink Meetup.
## How to test it's working
In example, by using the [Zeppelin notebook](https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL05GTGFicy96ZXBwZWxpbi1ub3RlYm9va3MvbWFzdGVyL25vdGVib29rcy8yQVFFREs1UEMvbm90ZS5qc29u) is from Till Rohrmann's presentation [Interactive data analysis with Apache Flink](http://www.slideshare.net/tillrohrmann/data-analysis-49806564) for Apache Flink Meetup.
```
%sh
rm 10.txt.utf-8
wget http://www.gutenberg.org/ebooks/10.txt.utf-8
```
```
{% highlight scala %}
%flink
case class WordCount(word: String, frequency: Int)
val bible:DataSet[String] = env.readTextFile("10.txt.utf-8")
@ -64,4 +57,4 @@ val wordCounts = partialCounts.groupBy("word").reduce{
(left, right) => WordCount(left.word, left.frequency + right.frequency)
}
val result10 = wordCounts.first(10).collect()
```
{% endhighlight %}

View file

@ -8,6 +8,7 @@ group: manual
## Hive Interpreter for Apache Zeppelin
The [Apache Hive](https://hive.apache.org/) ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
### Configuration
@ -30,48 +31,48 @@ group: manual
<tr>
<td>default.user</td>
<td></td>
<td><b>(Optional)</b>Username of the connection</td>
<td><b>( Optional ) </b>Username of the connection</td>
</tr>
<tr>
<td>default.password</td>
<td></td>
<td><b>(Optional)</b>Password of the connection</td>
<td><b>( Optional ) </b>Password of the connection</td>
</tr>
<tr>
<td>default.xxx</td>
<td></td>
<td><b>(Optional)</b>Other properties used by the driver</td>
<td><b>( Optional ) </b>Other properties used by the driver</td>
</tr>
<tr>
<td>${prefix}.driver</td>
<td></td>
<td>Driver class path of `%hive(${prefix})`</td>
<td>Driver class path of <code>%hive(${prefix})</code> </td>
</tr>
<tr>
<td>${prefix}.url</td>
<td></td>
<td>Url of `%hive(${prefix})`</td>
<td>Url of <code>%hive(${prefix})</code> </td>
</tr>
<tr>
<td>${prefix}.user</td>
<td></td>
<td><b>(Optional)</b>Username of the connection of `%hive(${prefix})`</td>
<td><b>( Optional ) </b>Username of the connection of <code>%hive(${prefix})</code> </td>
</tr>
<tr>
<td>${prefix}.password</td>
<td></td>
<td><b>(Optional)</b>Password of the connection of `%hive(${prefix})`</td>
<td><b>( Optional ) </b>Password of the connection of <code>%hive(${prefix})</code> </td>
</tr>
<tr>
<td>${prefix}.xxx</td>
<td></td>
<td><b>(Optional)</b>Other properties used by the driver of `%hive(${prefix})`</td>
<td><b>( Optional ) </b>Other properties used by the driver of <code>%hive(${prefix})</code> </td>
</tr>
</table>
This interpreter provides multiple configuration with ${prefix}. User can set a multiple connection properties by this prefix. It can be used like `%hive(${prefix})`.
This interpreter provides multiple configuration with `${prefix}`. User can set a multiple connection properties by this prefix. It can be used like `%hive(${prefix})`.
### How to use
## How to use
Basically, you can use
@ -90,9 +91,9 @@ select * from my_table;
You can also run multiple queries up to 10 by default. Changing these settings is not implemented yet.
#### Apply Zeppelin Dynamic Forms
### Apply Zeppelin Dynamic Forms
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
```sql
%hive

View file

@ -6,17 +6,14 @@ group: manual
---
{% include JB/setup %}
## Lens Interpreter for Apache Zeppelin
### Overview
[Apache Lens](https://lens.apache.org/) provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one.
![Apache Lens](../assets/themes/zeppelin/img/docs-img/lens-logo.png)
### Installing and Running Lens
In order to use Lens interpreters, you may install Apache Lens in some simple steps:
1. Download Lens for latest version from [the ASF](http://www.apache.org/dyn/closer.lua/lens/2.3-beta). Or the older release can be found [in the Archives](http://archive.apache.org/dist/lens/).
@ -28,8 +25,7 @@ In order to use Lens interpreters, you may install Apache Lens in some simple st
```
### Configuring Lens Interpreter
At the "Interpreters" menu, you can to edit Lens interpreter or create new one. Zeppelin provides these properties for Lens.
At the "Interpreters" menu, you can edit Lens interpreter or create new one. Zeppelin provides these properties for Lens.
<table class="table-configuration">
<tr>
@ -82,14 +78,12 @@ At the "Interpreters" menu, you can to edit Lens interpreter or create new one.
![Apache Lens Interpreter Setting](../assets/themes/zeppelin/img/docs-img/lens-interpreter-setting.png)
### Interpreter Bindging for Zeppelin Notebook
After configuring Lens interpreter, create your own notebook, then you can bind interpreters like below image.
![Zeppelin Notebook Interpreter Biding](../assets/themes/zeppelin/img/docs-img/lens-interpreter-binding.png)
For more interpreter binding information see [here](http://zeppelin.incubator.apache.org/docs/manual/interpreters.html).
### How to use
### How to use
You can analyze your data by using [OLAP Cube](http://lens.apache.org/user/olap-cube.html) [QL](http://lens.apache.org/user/cli.html) which is a high level SQL like language to query and describe data sets organized in data cubes.
You may experience OLAP Cube like this [Video tutorial](https://cwiki.apache.org/confluence/display/LENS/2015/07/13/20+Minute+video+demo+of+Apache+Lens+through+examples).
As you can see in this video, they are using Lens Client Shell(./bin/lens-cli.sh). All of these functions also can be used on Zeppelin by using Lens interpreter.
@ -169,12 +163,7 @@ As you can see in this video, they are using Lens Client Shell(./bin/lens-cli.sh
These are just examples that provided in advance by Lens. If you want to explore whole tutorials of Lens, see the [tutorial video](https://cwiki.apache.org/confluence/display/LENS/2015/07/13/20+Minute+video+demo+of+Apache+Lens+through+examples).
### Lens UI Service
### Lens UI Service
Lens also provides web UI service. Once the server starts up, you can open the service on http://serverhost:19999/index.html and browse. You may also check the structure that you made and use query easily here.
![Lens UI Servive](../assets/themes/zeppelin/img/docs-img/lens-ui-service.png)

View file

@ -6,19 +6,18 @@ group: manual
---
{% include JB/setup %}
## Markdown Interpreter for Apache Zeppelin
### Overview
[Markdown](http://daringfireball.net/projects/markdown/) is a plain text formatting syntax designed so that it can be converted to HTML.
Zeppelin uses markdown4j, for more examples and extension support checkout [markdown4j](https://code.google.com/p/markdown4j/)
In Zeppelin notebook you can use ``` %md ``` in the beginning of a paragraph to invoke the Markdown interpreter to generate static html from Markdown plain text.
Zeppelin uses markdown4j. For more examples and extension support, please checkout [here](https://code.google.com/p/markdown4j/).
In Zeppelin notebook, you can use ` %md ` in the beginning of a paragraph to invoke the Markdown interpreter and generate static html from Markdown plain text.
In Zeppelin, Markdown interpreter is enabled by default.
<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/markdown-interpreter-setting.png" width="600px" />
<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/markdown-interpreter-setting.png" width="60%" />
### Example
The following example demonstrates the basic usage of Markdown in a Zeppelin notebook.
<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/markdown-example.png" width="800px" />
<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/markdown-example.png" width="70%" />

View file

@ -7,8 +7,7 @@ group: manual
{% include JB/setup %}
## Spark Interpreter
## Spark Interpreter for Apache Zeppelin
[Apache Spark](http://spark.apache.org) is supported in Zeppelin with
Spark Interpreter group, which consisted of 4 interpreters.
@ -40,14 +39,11 @@ Spark Interpreter group, which consisted of 4 interpreters.
</tr>
</table>
## Configuration
Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need to follow below two simple steps.
### Configuration
Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need following two simple steps.
#### 1. Export SPARK_HOME
In **conf/zeppelin-env.sh**, export SPARK_HOME environment variable with your Spark installation path.
### 1. Export SPARK_HOME
In **conf/zeppelin-env.sh**, export `SPARK_HOME` environment variable with your Spark installation path.
for example
@ -62,8 +58,7 @@ export HADOOP_CONF_DIR=/usr/lib/hadoop
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"
```
#### 2. Set master in Interpreter menu.
### 2. Set master in Interpreter menu
After start Zeppelin, go to **Interpreter** menu and edit **master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
for example,
@ -73,27 +68,22 @@ for example,
* **yarn-client** in Yarn client mode
* **mesos://host:5050** in Mesos cluster
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuild Zeppelin in this way. (Zeppelin 0.5.5-incubating release works up to Spark 1.5.1)
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way. ( Zeppelin 0.5.5-incubating release works up to Spark 1.5.2 )
Note that without exporting SPARK_HOME, it's running in local mode with included version of Spark. The included version may vary depending on the build profile.
### SparkContext, SQLContext, ZeppelinContext
> Note that without exporting `SPARK_HOME`, it's running in local mode with included version of Spark. The included version may vary depending on the build profile.
## SparkContext, SQLContext, ZeppelinContext
SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments.
Note that scala / python environment shares the same SparkContext, SQLContext, ZeppelinContext instance.
> Note that scala / python environment shares the same SparkContext, SQLContext, ZeppelinContext instance.
<a name="dependencyloading"> </a>
## Dependency Management
There are two ways to load external library in spark interpreter. First is using Zeppelin's `%dep` interpreter and second is loading Spark properties.
### Dependency Management
There are two ways to load external library in spark interpreter. First is using Zeppelin's %dep interpreter and second is loading Spark properties.
#### 1. Dynamic Dependency Loading via %dep interpreter
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using %dep interpreter.
### 1. Dynamic Dependency Loading via %dep interpreter
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using `%dep` interpreter.
* Load libraries recursively from Maven repository
* Load libraries from local filesystem
@ -101,7 +91,7 @@ When your code requires external library, instead of doing download/copy/restart
* Automatically add libraries to SparkCluster (You can turn off)
Dep interpreter leverages scala environment. So you can write any Scala code here.
Note that %dep interpreter should be used before %spark, %pyspark, %sql.
Note that `%dep` interpreter should be used before `%spark`, `%pyspark`, `%sql`.
Here's usages.
@ -139,8 +129,7 @@ z.load("groupId:artifactId:version").exclude("groupId:*")
z.load("groupId:artifactId:version").local()
```
#### 2. Loading Spark Properties
### 2. Loading Spark Properties
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are:
<table class="table-configuration">
@ -169,8 +158,7 @@ Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit`
<td>Comma-separated list of files to be placed in the working directory of each executor.</td>
</tr>
</table>
Note that adding jar to pyspark is only availabe via %dep interpreter at the moment
> Note that adding jar to pyspark is only availabe via `%dep` interpreter at the moment.
Here are few examples:
@ -184,37 +172,43 @@ Here are few examples:
spark.jars.packages com.databricks:spark-csv_2.10:1.2.0
spark.files /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip
### ZeppelinContext
## ZeppelinContext
Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility.
#### Object exchange
### Object Exchange
ZeppelinContext extends map and it's shared between scala, python environment.
So you can put some object from scala and read it from python, vise versa.
Put object from scala
<div class="codetabs">
<div data-lang="scala" markdown="1">
```scala
{% highlight scala %}
// Put object from scala
%spark
val myObject = ...
z.put("objName", myObject)
```
{% endhighlight %}
Get object from python
</div>
<div data-lang="python" markdown="1">
```python
%python
{% highlight python %}
# Get object from python
%pyspark
myObject = z.get("objName")
```
{% endhighlight %}
</div>
</div>
#### Form creation
### Form Creation
ZeppelinContext provides functions for creating forms.
In scala and python environments, you can create forms programmatically.
<div class="codetabs">
<div data-lang="scala" markdown="1">
```scala
{% highlight scala %}
%spark
/* Create text input form */
z.input("formName")
@ -229,7 +223,30 @@ z.select("formName", Seq(("option1", "option1DisplayName"),
/* Create select form with default value*/
z.select("formName", "option1", Seq(("option1", "option1DisplayName"),
("option2", "option2DisplayName")))
```
{% endhighlight %}
</div>
<div data-lang="python" markdown="1">
{% highlight python %}
%pyspark
# Create text input form
z.input("formName")
# Create text input form with default value
z.input("formName", "defaultValue")
# Create select form
z.select("formName", [("option1", "option1DisplayName"),
("option2", "option2DisplayName")])
# Create select form with default value
z.select("formName", [("option1", "option1DisplayName"),
("option2", "option2DisplayName")], "option1")
{% endhighlight %}
</div>
</div>
In sql environment, you can create form in simple template.

View file

@ -20,45 +20,45 @@ limitations under the License.
{% include JB/setup %}
## Interpreters in zeppelin
## Interpreters in Zeppelin
In this section, we will explain about the role of interpreters, interpreters group and interpreter settings in Zeppelin.
The concept of Zeppelin interpreter allows any language/data-processing-backend to be plugged into Zeppelin.
Currently, Zeppelin supports many interpreters such as Scala ( with Apache Spark ), Python ( with Apache Spark ), SparkSQL, Hive, Markdown, Shell and so on.
This section explain the role of Interpreters, interpreters group and interpreters settings in Zeppelin.
Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin.
Currently Zeppelin supports many interpreters such as Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown and Shell.
## What is Zeppelin interpreter?
### What is zeppelin interpreter?
Zeppelin Interpreter is a plug-in which enables Zeppelin users to use a specific language/data-processing-backend. For example, to use scala code in Zeppelin, you need `%spark` interpreter.
Zeppelin Interpreter is the plug-in which enable zeppelin user to use a specific language/data-processing-backend. For example to use scala code in Zeppelin, you need ```spark``` interpreter.
When you click on the ```+Create``` button in the interpreter page the interpreter drop-down list box will present all the available interpreters on your server.
When you click the ```+Create``` button in the interpreter page, the interpreter drop-down list box will show all the available interpreters on your server.
<img src="/assets/themes/zeppelin/img/screenshots/interpreter_create.png">
### What is zeppelin interpreter setting?
## What is Zeppelin Interpreter Setting?
Zeppelin interpreter setting is the configuration of a given interpreter on zeppelin server. For example, the properties requried for hive JDBC interpreter to connect to the Hive server.
Zeppelin interpreter setting is the configuration of a given interpreter on Zeppelin server. For example, the properties are required for hive JDBC interpreter to connect to the Hive server.
<img src="/assets/themes/zeppelin/img/screenshots/interpreter_setting.png">
### What is zeppelin interpreter group?
## What is Zeppelin Interpreter Group?
Every Interpreter belongs to an InterpreterGroup. InterpreterGroup is a unit of start/stop interpreter.
By default, every interpreter belong to a single group but the group might contain more interpreters. For example, spark interpreter group include spark support, pySpark,
Every Interpreter is belonged to an **Interpreter Group**. Interpreter Group is a unit of start/stop interpreter.
By default, every interpreter is belonged to a single group, but the group might contain more interpreters. For example, spark interpreter group is including Spark support, pySpark,
SparkSQL and the dependency loader.
Technically, Zeppelin interpreters from the same group are running in the same JVM.
Technically, Zeppelin interpreters from the same group are running in the same JVM. For more information about this, please checkout [here](../development/writingzeppelininterpreter.html).
Interpreters belong to a single group a registered together and all of their properties are listed in the interpreter setting.
Each interpreters is belonged to a single group and registered together. All of their properties are listed in the interpreter setting like below image.
<img src="/assets/themes/zeppelin/img/screenshots/interpreter_setting_spark.png">
### Programming langages for interpreter
## Programming Languages for Interpreter
If the interpreter uses a specific programming language (like Scala, Python, SQL), it is generally a good idea to add syntax highlighting support for that to the notebook paragraph editor.
If the interpreter uses a specific programming language ( like Scala, Python, SQL ), it is generally recommended to add a syntax highlighting supported for that to the notebook paragraph editor.
To check out the list of languages supported, see the `mode-*.js` files under `zeppelin-web/bower_components/ace-builds/src-noconflict` or from [github.com/ajaxorg/ace-builds](https://github.com/ajaxorg/ace-builds/tree/master/src-noconflict).
If you want to add a new set of syntax highlighting,
To check out the list of languages supported, see the mode-*.js files under zeppelin-web/bower_components/ace-builds/src-noconflict or from github https://github.com/ajaxorg/ace-builds/tree/master/src-noconflict
To add a new set of syntax highlighting,
1. add the mode-*.js file to zeppelin-web/bower.json (when built, zeppelin-web/src/index.html will be changed automatically)
2. add to the list of `editorMode` in zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js - it follows the pattern 'ace/mode/x' where x is the name
3. add to the code that checks for `%` prefix and calls `session.setMode(editorMode.x)` in `setParagraphMode` in zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js
1. Add the `mode-*.js` file to `zeppelin-web/bower.json` ( when built, `zeppelin-web/src/index.html` will be changed automatically. ).
2. Add to the list of `editorMode` in `zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js` - it follows the pattern 'ace/mode/x' where x is the name.
3. Add to the code that checks for `%` prefix and calls `session.setMode(editorMode.x)` in `setParagraphMode` located in `zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js`.

View file

@ -22,28 +22,29 @@ limitations under the License.
## Zeppelin REST API
Zeppelin provides several REST API's for interaction and remote activation of zeppelin functionality.
All REST API are available starting with the following endpoint ```http://[zeppelin-server]:[zeppelin-port]/api```
All REST API are available starting with the following endpoint `http://[zeppelin-server]:[zeppelin-port]/api`.
Note that zeppein REST API receive or return JSON objects, it it recommended you install some JSON view such as
[JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc)
[JSON View](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc).
If you work with zeppelin and find a need for an additional REST API please [file an issue or send us mail](../../community.html)
If you work with zeppelin and find a need for an additional REST API, please [file an issue or send us mail](http://zeppelin.incubator.apache.org/community.html).
<br />
### Interpreter REST API list
## Interpreter REST API List
The role of registered interpreters, settings and interpreters group is described [here](../manual/interpreters.html)
The role of registered interpreters, settings and interpreters group are described in [here](../manual/interpreters.html).
### 1. List of Registered Interpreters & Interpreter Settings
<table class="table-configuration">
<col width="200">
<tr>
<th>List registered interpreters</th>
<th>List of registered interpreters</th>
<th></th>
</tr>
<tr>
<td>Description</td>
<td>This ```GET``` method return all the registered interpreters available on the server.</td>
<td>This ```GET``` method returns all the registered interpreters available on the server.</td>
</tr>
<tr>
<td>URL</td>
@ -54,12 +55,11 @@ limitations under the License.
<td>200</td>
</tr>
<tr>
<td> Fail code</td>
<td>Fail code</td>
<td> 500 </td>
</tr>
<tr>
<td> sample JSON response
</td>
<td>Sample JSON response</td>
<td>
<pre>
{
@ -113,12 +113,12 @@ limitations under the License.
<table class="table-configuration">
<col width="200">
<tr>
<th>List interpreters settings</th>
<th>List of interpreters settings</th>
<th></th>
</tr>
<tr>
<td>Description</td>
<td>This ```GET``` method return all the interpreters settings registered on the server.</td>
<td>This ```GET``` method returns all the interpreters settings registered on the server.</td>
</tr>
<tr>
<td>URL</td>
@ -129,12 +129,11 @@ limitations under the License.
<td>200</td>
</tr>
<tr>
<td> Fail code</td>
<td>Fail code</td>
<td> 500 </td>
</tr>
<tr>
<td> sample JSON response
</td>
<td>Sample JSON response</td>
<td>
<pre>
{
@ -182,7 +181,8 @@ limitations under the License.
</table>
<br/>
### 2. Create an Interpreter Setting
<table class="table-configuration">
<col width="200">
<tr>
@ -202,12 +202,11 @@ limitations under the License.
<td>201</td>
</tr>
<tr>
<td> Fail code</td>
<td>Fail code</td>
<td> 500 </td>
</tr>
<tr>
<td> sample JSON input
</td>
<td>Sample JSON input</td>
<td>
<pre>
{
@ -227,8 +226,7 @@ limitations under the License.
</td>
</tr>
<tr>
<td> sample JSON response
</td>
<td>Sample JSON response</td>
<td>
<pre>
{
@ -256,7 +254,8 @@ limitations under the License.
<br/>
### 3. Update an Interpreter Setting
<table class="table-configuration">
<col width="200">
<tr>
@ -276,12 +275,11 @@ limitations under the License.
<td>200</td>
</tr>
<tr>
<td> Fail code</td>
<td>Fail code</td>
<td> 500 </td>
</tr>
<tr>
<td> sample JSON input
</td>
<td>Sample JSON input</td>
<td>
<pre>
{
@ -301,8 +299,7 @@ limitations under the License.
</td>
</tr>
<tr>
<td> sample JSON response
</td>
<td>Sample JSON response</td>
<td>
<pre>
{
@ -330,7 +327,8 @@ limitations under the License.
<br/>
### 4. Delete an Interpreter Setting
<table class="table-configuration">
<col width="200">
<tr>
@ -354,17 +352,17 @@ limitations under the License.
<td> 500 </td>
</tr>
<tr>
<td> sample JSON response
</td>
<td>Sample JSON response</td>
<td>
<pre>{"status":"OK"}</pre>
<code>{"status":"OK"}</code>
</td>
</tr>
</table>
<br/>
### 5. Restart an Interpreter
<table class="table-configuration">
<col width="200">
<tr>
@ -373,7 +371,7 @@ limitations under the License.
</tr>
<tr>
<td>Description</td>
<td>This ```PUT``` method restart the given interpreter id.</td>
<td>This ```PUT``` method restarts the given interpreter id.</td>
</tr>
<tr>
<td>URL</td>
@ -384,14 +382,13 @@ limitations under the License.
<td>200</td>
</tr>
<tr>
<td> Fail code</td>
<td>Fail code</td>
<td> 500 </td>
</tr>
<tr>
<td> sample JSON response
</td>
<td>Sample JSON response</td>
<td>
<pre>{"status":"OK"}</pre>
<code>{"status":"OK"}</code>
</td>
</tr>
</table>

View file

@ -17,20 +17,20 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Zeppelin Tutorial
## Zeppelin Tutorial
We will assume you have Zeppelin installed already. If that's not the case, see [Install](../install/install.html).
This tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see [here](../install/install.html) first.
Zeppelin's current main backend processing engine is [Apache Spark](https://spark.apache.org). If you're new to the system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.
Current main backend processing engine of Zeppelin is [Apache Spark](https://spark.apache.org). If you're new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.
<br />
### Tutorial with Local File
## Tutorial with Local File
#### Data Refine
### 1. Data Refine
Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip).
First, to transform data from csv format into RDD of `Bank` objects, run following script. This will also remove header using `filter` function.
First, to transform csv format data into RDD of `Bank` objects, run following script. This will also remove header using `filter` function.
```scala
@ -38,7 +38,7 @@ val bankText = sc.textFile("yourPath/bank/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)
// split each line, filter out header (starts with "age"), and map it into Bank case class
// split each line, filter out header (starts with "age"), and map it into Bank case class
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
@ -52,8 +52,7 @@ val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
bank.toDF().registerTempTable("bank")
```
<br />
#### Data Retrieval
### 2. Data Retrieval
Suppose we want to see age distribution from `bank`. To do this, run:
@ -74,9 +73,9 @@ Now we want to see age distribution with certain marital status and add combo bo
```
<br />
### Tutorial with Streaming Data
## Tutorial with Streaming Data
#### Data Refine
### 1. Data Refine
Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script.
@ -136,12 +135,11 @@ twt.print
ssc.start()
```
<br />
#### Data Retrieval
### 2. Data Retrieval
For each following script, every time you click run button you will see different result since it is based on real-time data.
Let's begin by extracting maximum 10 tweets which contain the word "girl".
Let's begin by extracting maximum 10 tweets which contain the word **girl**.
```sql
%sql select * from tweets where text like '%girl%' limit 10
@ -154,7 +152,7 @@ This time suppose we want to see how many tweets have been created per sec durin
```
You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes(positive, negative, neutral) towards the parameter.
You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter.
```scala
def sentiment(s:String) : String = {

View file

@ -45,7 +45,7 @@ limitations under the License.
</img>
<div id="{{paragraph.id}}_error"
class="error"
class="error text"
ng-if="paragraph.status == 'ERROR'"
ng-bind="paragraph.errorMessage">
</div>

View file

@ -796,8 +796,10 @@ angular.module('zeppelinWebApp')
$scope.moveUp();
} else if (keyEvent.ctrlKey && keyEvent.altKey && keyCode === 74) { // Ctrl + Alt + j
$scope.moveDown();
} else if (keyEvent.ctrlKey && keyEvent.altKey && keyCode === 65) { // Ctrl + Alt + a
$scope.insertNew('above');
} else if (keyEvent.ctrlKey && keyEvent.altKey && keyCode === 66) { // Ctrl + Alt + b
$scope.insertNew();
$scope.insertNew('below');
} else if (keyEvent.ctrlKey && keyEvent.altKey && keyCode === 79) { // Ctrl + Alt + o
$scope.toggleOutput();
} else if (keyEvent.ctrlKey && keyEvent.altKey && keyCode === 69) { // Ctrl + Alt + e

View file

@ -78,6 +78,17 @@ limitations under the License.
</div>
</div>
<div class="row">
<div class="col-md-4">
<div class="keys">
<kbd class="kbd-dark">Ctrl</kbd> + <kbd class="kbd-dark">Alt</kbd> + <kbd class="kbd-dark">a</kbd>
</div>
</div>
<div class="col-md-8">
Insert new paragraph above
</div>
</div>
<div class="row">
<div class="col-md-4">
<div class="keys">