mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
### What is this PR for? This PR adds Mongo notebook storage. The reason that I made this feature is for HA(High Availability). S3 and Git storage are the only available method for HA as far as I know. I'm managing Ambari cluster in my lab, but Zeppelin is the most vulnerable part of it. Because one server contains all Zeppelin notes. Therefore, by deploying MongoDB's [replica set](https://docs.mongodb.com/manual/replication/) and using it as Zeppelin notebook storage, I would like to achieve HA. #### The way to use Mongo DB as notebook storage ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo ``` or at `zeppelin-site.xml`: ```xml <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> ``` #### Configurable environment variables * `ZEPPELIN_NOTEBOOK_MONGO_URI` MongoDB connection URI * `ZEPPELIN_NOTEBOOK_MONGO_DATABASE` Database name * `ZEPPELIN_NOTEBOOK_MONGO_COLLECTION` Collection name * `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` If `true`, automatically import your local notes. Default `false` They can be configured at `zeppelin-site.xml` as well: * `zeppelin.notebook.mongo.uri` * `zeppelin.notebook.mongo.database` * `zeppelin.notebook.mongo.collection` * `zeppelin.notebook.mongo.autoimport` #### Future work If we use Mongo DB's [oplog tailing](https://docs.mongodb.com/manual/core/replica-set-oplog/), maybe multi-server architecture is possible. ### What type of PR is it? [Feature] ### Todos * [ ] - Write a documentation for Mongo storage ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1859 ### How should this be tested? #### Install MongoDB (if you don't have) ```sh brew update brew install mongodb ``` #### Build Zepppelin ```sh mvn clean package -DskipTests ``` #### Run Zeppelin wih Mongo storage ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true bin/zeppelin-daemon.sh restart ``` The default database and collection names are `zeppelin`, `notes` respectively. And `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` option will automatically import your `local notes` that don't exist in MongoDB. #### Check whether a document in MongoDB updated Create, update, remove a note and open mongo shell: ```sh mongo zeppelin ``` And check state of the note is the same as you think: ```sh db.notes.findOne({_id: '<NOTE_ID_THAT_YOU_WANT_TO_SEE>'}) ``` #### Confirm that configurations works ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true export ZEPPELIN_NOTEBOOK_MONGO_DATABASE=otherdb export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION=mynotes export ZEPPELIN_NOTEBOOK_MONGO_URI=mongodb://localhost:27017 bin/zeppelin-daemon.sh restart ``` The collection `mynotes` should be created in db `otherdb`. Let's check it! ```sh mongo otherdb db.mynotes.count() ``` The result should not be zero. #### Confirm that configurations from `zeppelin-site.xml` works Open your `conf/zeppelin-site.xml` file (copy from `zeppelin-site.xml.template` if you don't have one), and comment lines below: ```xml <!-- <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> --> ``` And add lines below: ```xml <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> <property> <name>zeppelin.notebook.mongo.uri</name> <value>mongodb://localhost</value> <description>MongoDB connection URI used to connect to a MongoDB database server</description> </property> <property> <name>zeppelin.notebook.mongo.database</name> <value>zepl</value> <description>database name for notebook storage</description> </property> <property> <name>zeppelin.notebook.mongo.collection</name> <value>notes</value> <description>collection name for notebook storage</description> </property> <property> <name>zeppelin.notebook.mongo.autoimport</name> <value>false</value> <description>import local notes into MongoDB automatically on startup</description> </property> ``` This time we will import a note via `mongoimport`. I made it possible to import a note from JSON just in case. ```sh cd $ZEPPELIN_HOME/notebook/<NOTE_ID_YOU_WANT_TO_IMPORT> mongoimport --db zepl --collection notes --file note.json ``` Ensure that your environment variables are clean(just reopen your terminal if you are not), and restart zeppelin: ```sh bin/zeppelin-daemon.sh restart ``` Open browser and go to `localhost:8080`. The note that you imported should be shown. ### Questions: * Does the licenses files need update? Maybe...? I used [java-mongodb-driver](https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.4.1) which has *The Apache Software License, Version 2.0* * Is there breaking changes for older versions? NO * Does this needs documentation? YES Author: Jun Kim <i2r.jun@gmail.com> Closes #1826 from tae-jun/ZEPPELIN-1859 and squashes the following commits: |
||
|---|---|---|
| .. | ||
| storage.md | ||