zeppelin/docs/storage
Jun Kim 5e75145ac8 [ZEPPELIN-1859] Add MongoNotebookRepo
### What is this PR for?
This PR adds Mongo notebook storage.

The reason that I made this feature is for HA(High Availability).
S3 and Git storage are the only available method for HA as far as I know.

I'm managing Ambari cluster in my lab, but Zeppelin is the most vulnerable part of it.
Because one server contains all Zeppelin notes.

Therefore, by deploying MongoDB's [replica set](https://docs.mongodb.com/manual/replication/) and using it as Zeppelin notebook storage, I would like to achieve HA.

#### The way to use Mongo DB as notebook storage
```sh
export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo
```

or at `zeppelin-site.xml`:
```xml
<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>
```
#### Configurable environment variables
* `ZEPPELIN_NOTEBOOK_MONGO_URI` MongoDB connection URI
* `ZEPPELIN_NOTEBOOK_MONGO_DATABASE` Database name
* `ZEPPELIN_NOTEBOOK_MONGO_COLLECTION` Collection name
* `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` If `true`, automatically import your local notes. Default `false`

They can be configured at `zeppelin-site.xml` as well:
* `zeppelin.notebook.mongo.uri`
* `zeppelin.notebook.mongo.database`
* `zeppelin.notebook.mongo.collection`
* `zeppelin.notebook.mongo.autoimport`

#### Future work
If we use Mongo DB's [oplog tailing](https://docs.mongodb.com/manual/core/replica-set-oplog/), maybe multi-server architecture is possible.

### What type of PR is it?
[Feature]

### Todos
* [ ] - Write a documentation for Mongo storage

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-1859

### How should this be tested?
#### Install MongoDB (if you don't have)
```sh
brew update
brew install mongodb
```
#### Build Zepppelin
```sh
mvn clean package -DskipTests
```
#### Run Zeppelin wih Mongo storage
```sh
export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo
export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true
bin/zeppelin-daemon.sh restart
```
The default database and collection names are `zeppelin`, `notes` respectively.
And `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` option will automatically import your `local notes` that don't exist in MongoDB.
#### Check whether a document in MongoDB updated
Create, update, remove a note and open mongo shell:
```sh
mongo zeppelin
```
And check state of the note is the same as you think:
```sh
db.notes.findOne({_id: '<NOTE_ID_THAT_YOU_WANT_TO_SEE>'})
```
#### Confirm that configurations works
```sh
export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo
export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true
export ZEPPELIN_NOTEBOOK_MONGO_DATABASE=otherdb
export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION=mynotes
export ZEPPELIN_NOTEBOOK_MONGO_URI=mongodb://localhost:27017
bin/zeppelin-daemon.sh restart
```
The collection `mynotes` should be created in db `otherdb`.
Let's check it!
```sh
mongo otherdb
db.mynotes.count()
```
The result should not be zero.

#### Confirm that configurations from `zeppelin-site.xml` works
Open your `conf/zeppelin-site.xml` file (copy from `zeppelin-site.xml.template` if you don't have one), and comment lines below:
```xml
<!--
<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>
-->
```
And add lines below:
```xml
<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

<property>
  <name>zeppelin.notebook.mongo.uri</name>
  <value>mongodb://localhost</value>
  <description>MongoDB connection URI used to connect to a MongoDB database server</description>
</property>

<property>
  <name>zeppelin.notebook.mongo.database</name>
  <value>zepl</value>
  <description>database name for notebook storage</description>
</property>

<property>
  <name>zeppelin.notebook.mongo.collection</name>
  <value>notes</value>
  <description>collection name for notebook storage</description>
</property>

<property>
  <name>zeppelin.notebook.mongo.autoimport</name>
  <value>false</value>
  <description>import local notes into MongoDB automatically on startup</description>
</property>
```

This time we will import a note via `mongoimport`. I made it possible to import a note from JSON just in case.
```sh
cd $ZEPPELIN_HOME/notebook/<NOTE_ID_YOU_WANT_TO_IMPORT>
mongoimport --db zepl --collection notes --file note.json
```

Ensure that your environment variables are clean(just reopen your terminal if you are not), and restart zeppelin:
```sh
bin/zeppelin-daemon.sh restart
```
Open browser and go to `localhost:8080`. The note that you imported should be shown.

### Questions:
* Does the licenses files need update? Maybe...? I used [java-mongodb-driver](https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.4.1) which has *The Apache Software License, Version 2.0*
* Is there breaking changes for older versions? NO
* Does this needs documentation? YES

Author: Jun Kim <i2r.jun@gmail.com>

Closes #1826 from tae-jun/ZEPPELIN-1859 and squashes the following commits:

98282ae [Jun Kim] Add a documentation for MongoDB notebook storage
77947b8 [Jun Kim] Add license information of mongo-java-driver
08eee3d [Jun Kim] fix style check error
a4fba8c [Jun Kim] Add MongoNotebookRepo
2017-02-21 09:29:09 +09:00
..
storage.md [ZEPPELIN-1859] Add MongoNotebookRepo 2017-02-21 09:29:09 +09:00