zeppelin/docs/storage/storage.md
AhyoungRyu 85d4df4f0c [ZEPPELIN-1219] Add searching feature to Zeppelin docs site
### What is this PR for?
As more and more document pages are added, it's really hard to find specific pages. So I added searching feature to Zeppelin documentation site([jekyll](https://jekyllrb.com/) based site) using [lunr.js](http://lunrjs.com/).

 - **How does it work?**

  I created [`search_data.json`](6e02423f54/docs/search_data.json) which is used for docs info template. `lunr.js` combines all of the text from all of the docs in `docs/` into `_site/search_data.json`. It looks like below.
![screen shot 2016-08-03 at 4 49 59 am](https://cloud.githubusercontent.com/assets/10060731/17342828/f2908be8-5935-11e6-8eee-b189677c0531.png)
All the info are comes from [Jekyll YAML front matter](https://jekyllrb.com/docs/frontmatter/) variables. (i.e. title, group, description.. that's why I rewrote all docs' title and description.)
[search.js](6e02423f54/docs/assets/themes/zeppelin/js/search.js) will do this job using this data!

### What type of PR is it?
Improvement & Feature

### Todos
* [x] - Keep consistency for all docs pages' `Title`
* [x] - Add some overview sentences to all docs pages' `Description` section (this will be used as the result preview)
* [x] - Add apache license header to all docs page (some pages are missing the license header currently)
* [x] - Add LICENSE for `lunr.min.js`

### What is the Jira issue?
[ZEPPELIN-1219](https://issues.apache.org/jira/browse/ZEPPELIN-1219)

### How should this be tested?
1. Apply this patch and build `ZEPPELIN_HOME/docs` dir -> please see [docs/README.md#build-documentation](https://github.com/apache/zeppelin/tree/master/docs#build-documentation)
2. Click `search` icon in navbar and go to `search.html` page
3. Type anything you want to search in the search bar (i.e. type `python`, `spark`, `dynamic` ... )

### Screenshots (if appropriate)
![screen shot 2016-08-03 at 4 42 28 pm](https://cloud.githubusercontent.com/assets/10060731/17357851/d092e2ca-5999-11e6-9917-a3d4113e6e43.png)

![search](https://cloud.githubusercontent.com/assets/10060731/17357828/b2486cd6-5999-11e6-873b-121fac033b03.gif)

### Questions:
* Does the licenses files need update? Yes, for `lunr.min.js`
* Is there breaking changes for older versions? no
* Does this needs documentation? no

Author: AhyoungRyu <fbdkdud93@hanmail.net>

Closes #1266 from AhyoungRyu/ZEPPELIN-1219 and squashes the following commits:

7ec8854 [AhyoungRyu] Modify 'no result' sentence
91b71a7 [AhyoungRyu] Remove Apache license header since JSON doesn't allow comment
34afd5d [AhyoungRyu] Add Apache license header to search_data.json
6784282 [AhyoungRyu] Minor search page UI update
0389d28 [AhyoungRyu] Make index.md not to be searched
9f1ba42 [AhyoungRyu] Disable enterkey press & change icon
bd4956a [AhyoungRyu] Add docs.js & search.js to exclude list in pom.xml
624b051 [AhyoungRyu] Add Apache license header to search.js
1381152 [AhyoungRyu] Fix search result skipping issue
6e775f5 [AhyoungRyu] Make pleasecontribute.md not to be searched
ee11136 [AhyoungRyu] Fix some typos
fa01299 [AhyoungRyu] Refine 'description' in some docs as @bzz suggested
da0cff9 [AhyoungRyu] Exclude lunr.min.js
36ba7f1 [AhyoungRyu] Add lunr.min.js license info
f6a05a6 [AhyoungRyu] Apply css style for the search results
68eb997 [AhyoungRyu] Attach 'Apache Zeppelin ZEPPELIN_VERSION Documentation: ' to title
d908c37 [AhyoungRyu] Add searching page
a951fa6 [AhyoungRyu] Add search icon to navbar
0688a79 [AhyoungRyu] Keep consistency all docs' front matter for the right search result
040f532 [AhyoungRyu] Add template for storing docs info based on jekyll front matter
0705bd6 [AhyoungRyu] Add js files: lunr.min.js & search.js
2016-08-10 12:39:22 +09:00

8.4 KiB

layout title description group
page Notebook Storage for Apache Zeppelin Apache Zeppelin has a pluggable notebook storage mechanism controlled by zeppelin.notebook.storage configuration option with multiple implementations." storage

{% include JB/setup %}

Notebook storage options for Apache Zeppelin

Overview

Apache Zeppelin has a pluggable notebook storage mechanism controlled by zeppelin.notebook.storage configuration option with multiple implementations. There are few notebook storage systems available for a use out of the box:

  • (default) all notes are saved in the notebook folder in your local File System - VFSNotebookRepo
  • use local file system and version it using local Git repository - GitNotebookRepo
  • storage using Amazon S3 service - S3NotebookRepo
  • storage using Azure service - AzureNotebookRepo

Multiple storage systems can be used at the same time by providing a comma-separated list of the class-names in the configuration. By default, only first two of them will be automatically kept in sync by Zeppelin.


Notebook Storage in local Git repository

To enable versioning for all your local notebooks though a standard Git repository - uncomment the next property in zeppelin-site.xml in order to use GitNotebookRepo class:

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

Notebook Storage in S3

Notebooks may be stored in S3, and optionally encrypted. The DefaultAWSCredentialsProviderChain credentials provider is used for credentials and checks the following:

  • The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables
  • The aws.accessKeyId and aws.secretKey Java System properties
  • Credential profiles file at the default location (~/.aws/credentials) used by the AWS CLI
  • Instance profile credentials delivered through the Amazon EC2 metadata service

The following folder structure will be created in S3:
s3://bucket_name/username/notebook-id/

Configure by setting environment variables in the file zeppelin-env.sh:

export ZEPPELIN_NOTEBOOK_S3_BUCKET = bucket_name
export ZEPPELIN_NOTEBOOK_S3_USER = username

Or using the file zeppelin-site.xml uncomment and complete the S3 settings:

<property>
  <name>zeppelin.notebook.s3.bucket</name>
  <value>bucket_name</value>
  <description>bucket name for notebook storage</description>
</property>
<property>
  <name>zeppelin.notebook.s3.user</name>
  <value>username</value>
  <description>user name for s3 folder structure</description>
</property>

Uncomment the next property for use S3NotebookRepo class:

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.S3NotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

Comment out the next property to disable local notebook storage (the default):

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

Data Encryption in S3

AWS KMS encryption keys

To use an AWS KMS encryption key to encrypt notebooks, set the following environment variable in the file zeppelin-env.sh:

export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID = kms-key-id

Or using the following setting in zeppelin-site.xml:

<property>
  <name>zeppelin.notebook.s3.kmsKeyID</name>
  <value>AWS-KMS-Key-UUID</value>
  <description>AWS KMS key ID used to encrypt notebook data in S3</description>
</property>

Custom Encryption Materials Provider class

You may use a custom EncryptionMaterialsProvider class as long as it is available in the classpath and able to initialize itself from system properties or another mechanism. To use this, set the following environment variable in the file zeppelin-env.sh:

export ZEPPELIN_NOTEBOOK_S3_EMP = class-name

Or using the following setting in zeppelin-site.xml:

<property>
  <name>zeppelin.notebook.s3.encryptionMaterialsProvider</name>
  <value>provider implementation class name</value>
  <description>Custom encryption materials provider used to encrypt notebook data in S3</description>

## Notebook Storage in Azure

Using AzureNotebookRepo you can connect your Zeppelin with your Azure account for notebook storage.

First of all, input your AccountName, AccountKey, and Share Name in the file zeppelin-site.xml by commenting out and completing the next properties:

<property>
  <name>zeppelin.notebook.azure.connectionString</name>
  <value>DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey></value>
  <description>Azure account credentials</description>
</property>

<property>
  <name>zeppelin.notebook.azure.share</name>
  <value>zeppelin</value>
  <description>share name for notebook storage</description>
</property>

Secondly, you can initialize AzureNotebookRepo class in the file zeppelin-site.xml by commenting the next property:

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

and commenting out:

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.AzureNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

In case you want to use simultaneously your local storage with Azure storage use the following property instead:

<property>
 <name>zeppelin.notebook.storage</name>
 <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo, apache.zeppelin.notebook.repo.AzureNotebookRepo</value>
 <description>notebook persistence layer implementation</description>
</property>

Optionally, you can specify Azure folder structure name in the file zeppelin-site.xml by commenting out the next property:

<property>
 <name>zeppelin.notebook.azure.user</name>
 <value>user</value>
 <description>optional user name for Azure folder structure</description>
</property>

## Storage in ZeppelinHub

ZeppelinHub storage layer allows out of the box connection of Zeppelin instance with your ZeppelinHub account. First of all, you need to either comment out the following property in zeppelin-site.xml:

<!-- For connecting your Zeppelin with ZeppelinHub -->
<!--
<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo, org.apache.zeppelin.notebook.repo.zeppelinhub.ZeppelinHubRepo</value>
  <description>two notebook persistence layers (local + ZeppelinHub)</description>
</property>
-->

or set the environment variable in the file zeppelin-env.sh:

export ZEPPELIN_NOTEBOOK_STORAGE="org.apache.zeppelin.notebook.repo.VFSNotebookRepo, org.apache.zeppelin.notebook.repo.zeppelinhub.ZeppelinHubRepo"

Secondly, you need to set the environment variables in the file zeppelin-env.sh:

export ZEPPELINHUB_API_TOKEN = ZeppelinHub token
export ZEPPELINHUB_API_ADDRESS = address of ZeppelinHub service (e.g. https://www.zeppelinhub.com)

You can get more information on generating token and using authentication on the corresponding help page.