mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
### What is this PR for?
With this PR user will be able to set external libraries to be loaded to specific interpreter.
Note that the scope of this PR is downloading libraries to local repository, not distributing them to other nodes. Only spark interpreter distributes loaded dependencies to worker nodes at the moment.
Here is a brief explanation how the code works.
1. get rest api request for interpreter dependency setting from front-end
2. download the libraries in `ZEPPELIN_HOME/local-repo` and copy them to `ZEPPELIN_HOME/local-repo/{interpreterId}`
3. `ZEPPELIN_HOME/local-repo/{interpreterId}/*.jar` are added to interpreter classpath when interpreter process starts
### What type of PR is it?
Improvement
### Todos
* [x] Add tests
* [x] Update docs
### Is there a relevant Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-630
And this PR will resolve [ZEPPELIN-194](https://issues.apache.org/jira/browse/ZEPPELIN-194) [ZEPPELIN-381](https://issues.apache.org/jira/browse/ZEPPELIN-381) [ZEPPELIN-609](https://issues.apache.org/jira/browse/ZEPPELIN-609)
### How should this be tested?
1. Add repository(in interpreter menu, click gear button placed top right side)
```
id: spark-packages
url: http://dl.bintray.com/spark-packages/maven
snapshot: false
```
2. Set dependency in spark interpreter(click edit button of spark interpreter setting)
```
artifact: com.databricks:spark-csv_2.10:1.3.0
```
3. Download example csv file
```
$ wget https://github.com/databricks/spark-csv/raw/master/src/test/resources/cars.csv
```
4. run below code in paragraph
```
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true") // Use first line of all files as header
.option("inferSchema", "true") // Automatically infer data types
.load("file:///your/download/path/cars.csv")
df.registerTempTable("cars")
```
```
%sql select * from cars
```
### Screenshots (if appropriate)
* Toggle repository list
<img width="1146" alt="screen shot 2016-01-25 at 12 24 44 pm" src="https://cloud.githubusercontent.com/assets/8503346/12563475/52f060ac-c35f-11e5-8621-d8eb97b4d6a1.png">
* Add new repository
<img width="1146" alt="screen shot 2016-01-25 at 12 25 23 pm" src="https://cloud.githubusercontent.com/assets/8503346/12563472/52eb545e-c35f-11e5-9050-a5306d2765f1.png">
* Show repository info
<img width="1146" alt="screen shot 2016-01-25 at 12 25 28 pm" src="https://cloud.githubusercontent.com/assets/8503346/12563473/52ebab84-c35f-11e5-9acb-3a356c855dc7.png">
* Interpreter dependency
<img width="1146" alt="screen shot 2016-01-25 at 12 27 27 pm" src="https://cloud.githubusercontent.com/assets/8503346/12563471/52eadd9e-c35f-11e5-8e1a-f583ea8800aa.png">
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions?
- For the users who use rest api for creat/update interpreter setting, `dependencies` object should be added to request payload.
- %dep interpreter is deprecated. The functionality is still there, but recommend to load third party dependency via interpreter menu.
* Does this needs documentation? Yes
Author: Mina Lee <minalee@nflabs.com>
Closes #673 from minahlee/ZEPPELIN-630 and squashes the following commits:
|
||
|---|---|---|
| .. | ||
| rest-configuration.md | ||
| rest-interpreter.md | ||
| rest-notebook.md | ||