mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
13 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
5e75145ac8 |
[ZEPPELIN-1859] Add MongoNotebookRepo
### What is this PR for? This PR adds Mongo notebook storage. The reason that I made this feature is for HA(High Availability). S3 and Git storage are the only available method for HA as far as I know. I'm managing Ambari cluster in my lab, but Zeppelin is the most vulnerable part of it. Because one server contains all Zeppelin notes. Therefore, by deploying MongoDB's [replica set](https://docs.mongodb.com/manual/replication/) and using it as Zeppelin notebook storage, I would like to achieve HA. #### The way to use Mongo DB as notebook storage ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo ``` or at `zeppelin-site.xml`: ```xml <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> ``` #### Configurable environment variables * `ZEPPELIN_NOTEBOOK_MONGO_URI` MongoDB connection URI * `ZEPPELIN_NOTEBOOK_MONGO_DATABASE` Database name * `ZEPPELIN_NOTEBOOK_MONGO_COLLECTION` Collection name * `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` If `true`, automatically import your local notes. Default `false` They can be configured at `zeppelin-site.xml` as well: * `zeppelin.notebook.mongo.uri` * `zeppelin.notebook.mongo.database` * `zeppelin.notebook.mongo.collection` * `zeppelin.notebook.mongo.autoimport` #### Future work If we use Mongo DB's [oplog tailing](https://docs.mongodb.com/manual/core/replica-set-oplog/), maybe multi-server architecture is possible. ### What type of PR is it? [Feature] ### Todos * [ ] - Write a documentation for Mongo storage ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1859 ### How should this be tested? #### Install MongoDB (if you don't have) ```sh brew update brew install mongodb ``` #### Build Zepppelin ```sh mvn clean package -DskipTests ``` #### Run Zeppelin wih Mongo storage ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true bin/zeppelin-daemon.sh restart ``` The default database and collection names are `zeppelin`, `notes` respectively. And `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` option will automatically import your `local notes` that don't exist in MongoDB. #### Check whether a document in MongoDB updated Create, update, remove a note and open mongo shell: ```sh mongo zeppelin ``` And check state of the note is the same as you think: ```sh db.notes.findOne({_id: '<NOTE_ID_THAT_YOU_WANT_TO_SEE>'}) ``` #### Confirm that configurations works ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true export ZEPPELIN_NOTEBOOK_MONGO_DATABASE=otherdb export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION=mynotes export ZEPPELIN_NOTEBOOK_MONGO_URI=mongodb://localhost:27017 bin/zeppelin-daemon.sh restart ``` The collection `mynotes` should be created in db `otherdb`. Let's check it! ```sh mongo otherdb db.mynotes.count() ``` The result should not be zero. #### Confirm that configurations from `zeppelin-site.xml` works Open your `conf/zeppelin-site.xml` file (copy from `zeppelin-site.xml.template` if you don't have one), and comment lines below: ```xml <!-- <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> --> ``` And add lines below: ```xml <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> <property> <name>zeppelin.notebook.mongo.uri</name> <value>mongodb://localhost</value> <description>MongoDB connection URI used to connect to a MongoDB database server</description> </property> <property> <name>zeppelin.notebook.mongo.database</name> <value>zepl</value> <description>database name for notebook storage</description> </property> <property> <name>zeppelin.notebook.mongo.collection</name> <value>notes</value> <description>collection name for notebook storage</description> </property> <property> <name>zeppelin.notebook.mongo.autoimport</name> <value>false</value> <description>import local notes into MongoDB automatically on startup</description> </property> ``` This time we will import a note via `mongoimport`. I made it possible to import a note from JSON just in case. ```sh cd $ZEPPELIN_HOME/notebook/<NOTE_ID_YOU_WANT_TO_IMPORT> mongoimport --db zepl --collection notes --file note.json ``` Ensure that your environment variables are clean(just reopen your terminal if you are not), and restart zeppelin: ```sh bin/zeppelin-daemon.sh restart ``` Open browser and go to `localhost:8080`. The note that you imported should be shown. ### Questions: * Does the licenses files need update? Maybe...? I used [java-mongodb-driver](https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.4.1) which has *The Apache Software License, Version 2.0* * Is there breaking changes for older versions? NO * Does this needs documentation? YES Author: Jun Kim <i2r.jun@gmail.com> Closes #1826 from tae-jun/ZEPPELIN-1859 and squashes the following commits: |
||
|
|
5bb38c89ae |
[ZEPPELIN-1465] Add an option to allow S3 server-side encryption
### What is this PR for? Provide a configuration option that will cause the S3 Notebook repo to request server-side encryption of saved notebooks. ### What type of PR is it? Improvement ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1465 ### How should this be tested? Enable the configuration option, save a notebook in zeppelin, and confirm in the AWS S3 Console that the related file was saved with AES-256 encryption on the server-side. (Properties tab, Detail section) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No. * Does this needs documentation? I added mentions of the new option in existing documentation. Thank you! Author: Jeff Plourde <jplourde@cyft.io> Closes #1969 from jeff-cyft/s3_sse and squashes the following commits: |
||
|
|
982cc0d17e |
[DOCS] Reflect changed default storage to doc
### What is this PR for?
Reflect effects caused by changing the default notebook storage VFSNotebookRepo to GitNotebookRepo.
### What type of PR is it?
[Documentation]
### Questions:
* Does the licenses files need update? NO
* Is there breaking changes for older versions? NO
* Does this needs documentation? NO
Author: Jun Kim <i2r.jun@gmail.com>
Closes #1903 from tae-jun/patch-3 and squashes the following commits:
|
||
|
|
31085cc03e |
[ZEPPELIN-1848] add option for S3 KMS key region
### What is this PR for? When using S3 storage layer with encryption keys, currently only keys created in `us-east-1` region can be used. This PR adds ability to set target region for AWS KMS keys. ### What type of PR is it? Improvement ### Todos * [x] - add region to awsClient * [x] - add conf for region * [x] - tested with aws account `us-west-2` region ### What is the Jira issue? [ZEPPELIN-1848](https://issues.apache.org/jira/browse/ZEPPELIN-1848) ### How should this be tested? 1. set up S3 storage as in [here](https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/storage/storage.html#notebook-storage-in-s3) 2. add region variable with `export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION="us-west-2"` in `conf/zeppelin-env.sh` 3. start Zeppelin and read/write S3 ### Screenshots (if appropriate)  ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? updated Author: Khalid Huseynov <khalidhnv@gmail.com> Closes #1860 from khalidhuseynov/feat/s3-repo-kms-region and squashes the following commits: |
||
|
|
85d4df4f0c |
[ZEPPELIN-1219] Add searching feature to Zeppelin docs site
### What is this PR for? As more and more document pages are added, it's really hard to find specific pages. So I added searching feature to Zeppelin documentation site([jekyll](https://jekyllrb.com/) based site) using [lunr.js](http://lunrjs.com/). - **How does it work?** I created [`search_data.json`]( |
||
|
|
5975125f18 |
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website
### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one]( |
||
|
|
70ab1a376d |
[ZEPPELIN-952] Refine website style
### What is this PR for? - update document style (font, line-spacing) - apply same formats for documents - fix broke document styles ### What type of PR is it? Documentation ### What is the Jira issue? [ZEPPELIN-952](https://issues.apache.org/jira/browse/ZEPPELIN-952) ### Screenshots (if appropriate) **Before** <img width="1184" alt="screen shot 2016-06-04 at 9 51 38 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803667/d0dd5ac2-2a9f-11e6-9ed0-ddc369a97612.png"> **After** <img width="1184" alt="screen shot 2016-06-04 at 9 15 08 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803666/cd9212ea-2a9f-11e6-986e-17992a495ab6.png"> **Before** <img width="1183" alt="screen shot 2016-06-04 at 10 08 53 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803695/03e73126-2aa1-11e6-8675-3ca437aeb833.png"> **After** <img width="1184" alt="screen shot 2016-06-04 at 10 08 18 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803696/078ce866-2aa1-11e6-9044-4f5e16649eb4.png"> **Before** <img width="1184" alt="screen shot 2016-06-04 at 10 10 47 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803704/5787e9ba-2aa1-11e6-804c-076a8f3aa852.png"> **After** <img width="1184" alt="screen shot 2016-06-04 at 10 11 22 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803707/5afb5d0c-2aa1-11e6-98c7-7440db35bd2f.png"> **Before** <img width="188" alt="screen shot 2016-06-04 at 10 12 36 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803719/92e5cc3e-2aa1-11e6-9a9f-e12150e78733.png"> **After** <img width="199" alt="screen shot 2016-06-04 at 10 12 55 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803721/958e8c00-2aa1-11e6-8768-8350db6e7173.png"> ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Mina Lee <minalee@nflabs.com> Closes #962 from minahlee/ZEPPELIN-952 and squashes the following commits: |
||
|
|
8cde5c9bd4 |
ZeppelinHub notebook storage/connection repository
### What is this PR for? This is to add [ZeppelinHub](https://www.zeppelinhub.com) notebook storage/connection layer to the Zeppelin. ### What type of PR is it? Feature ### Todos * [x] - NotebookRepo rest api * [x] - ZeppelinHub websocket client * [x] - Zeppelin websocket client * [x] - Tests * [x] - More QA (authentication consistency, etc.) * [x] - Address review comments ### What is the Jira issue? ### How should this be tested? First of all, you may need to create account in [ZeppelinHub](https://www.zeppelinhub.com). Then you can set connection by following guides in [here](https://github.com/khalidhuseynov/incubator-zeppelin/blob/feat/zeppelinhub-storage/docs/storage/storage.md#notebook-storage-in-zeppelinhub--). Finally you should be able to access and manipulate your notebooks from inside of your ZeppelinHub account. ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Khalid Huseynov <khalidhnv@nflabs.com> Author: Anthony Corbacho <corbacho.anthony@gmail.com> Closes #880 from khalidhuseynov/feat/zeppelinhub-storage and squashes the following commits: |
||
|
|
7d00af4daf |
Documentation for setting Azure notebook storage
### What is this PR for? This PR adds general info and documentation on setting Azure storage in the `docs/storage.md` folder where we have info about all the supported pluggable storage layers. ### What type of PR is it? Documentation ### Todos * [x] - add docs * [x] - change description and order in `zeppelin-site.xml.template` ### What is the Jira issue? ### How should this be tested? Documentation follows the steps in `zeppelin-site.xml.template`. may need to have account to test. ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Khalid Huseynov <khalidhnv@nflabs.com> Closes #902 from khalidhuseynov/docs/azure-storage and squashes the following commits: |
||
|
|
db69e921b0 |
[ZEPPELIN-848] Add support for encrypted data stored in Amazon S3
### What is this PR for? Adds support for using the AWS KMS or a custom encryption materials provider class to encrypt data stored in Amazon S3. Also a minor improvement to logic inside the S3 notebook repo when dealing with local files. ### What type of PR is it? Improvement ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-848 ### How should this be tested? Running in EMR or another system in AWS is easiest. Make appropriate changes to the config and use an AWS KMS key ### Questions: * Does the licenses files need update? -- NO * Is there breaking changes for older versions? -- NO * Does this needs documentation? -- YES, changes in storage.md and zeppelin-site.xml.template Author: Nate Sammons <Nate.Sammons@nasdaq.com> Author: Nate Sammons <nate.sammons@nasdaq.com> Closes #886 from natesammons-nasdaq/master and squashes the following commits: |
||
|
|
a313e492c4 |
Fix typos in docs
### What is this PR for?
1. Fix some typos in docs.
2. Remove trailing white spaces for each line.
3. Remove leading white spaces if a line contains no content.
4. Add trailing new line for each file.
### What type of PR is it?
Improvement | Documentation
### Todos
None
### What is the Jira issue?
N/A
### How should this be tested?
Build the doc site and check.
### Screenshots (if appropriate)
N/A
### Questions:
* Does the licenses files need update? *no*
* Is there breaking changes for older versions? *no*
* Does this needs documentation? *no*
Author: Cheng-Yu Hsu <m@cyhsu.me>
Closes #852 from cyhsutw/fix-typos-in-docs and squashes the following commits:
|
||
|
|
b5e2e62f23 |
ZEPPELIN-143: Git as a versioned notebook storage
This is very basic implementation of the [ZEPPELIN-143](https://issues.apache.org/jira/browse/ZEPPELIN-143) at the backend. It makes a local git repository our of your `/notebook` dir and commits a new revision for each save/update. It does not: - add any remotes to the git repo. It is totally possible to do that manually though. It would be interesting to add this later, to be able to push the notebook to hostings like GH - have any GUI modifications. It is left as further work, to add the ability for a user to switch "versions" of the notebook, navigating between previous runs. Feedback is very welcome! Author: Alexander Bezzubov <bzz@apache.org> Closes #497 from bzz/add-git-notebook-repo and squashes the following commits: |
||
|
|
c2cbafd1d8 |
ZEPPELIN-412 Documentation based on Zeppelin version
https://issues.apache.org/jira/browse/ZEPPELIN-412 To provide documentation based on Zeppelin version, like Spark, Flink project does, it need to separate documentations from website. * docs will be kept in Zeppelin main source tree and being built and published under 'docs' menu on website with specific version number. * website will be kept in gh-pages branch and provides menu for multiple version of docs. This PR removes unnecessary pages, which is provided by website. (for example download page) This is the screenshot after applying this PR   Author: Lee moon soo <moon@apache.org> Closes #430 from Leemoonsoo/ZEPPELIN-412 and squashes the following commits: |