zeppelin/docs/quickstart/tutorial.md

198 lines
6.9 KiB
Markdown
Raw Normal View History

---
layout: page
[ZEPPELIN-1219] Add searching feature to Zeppelin docs site ### What is this PR for? As more and more document pages are added, it's really hard to find specific pages. So I added searching feature to Zeppelin documentation site([jekyll](https://jekyllrb.com/) based site) using [lunr.js](http://lunrjs.com/). - **How does it work?** I created [`search_data.json`](https://github.com/AhyoungRyu/zeppelin/blob/6e02423f541cc406e4e41031629609a276a9f481/docs/search_data.json) which is used for docs info template. `lunr.js` combines all of the text from all of the docs in `docs/` into `_site/search_data.json`. It looks like below. ![screen shot 2016-08-03 at 4 49 59 am](https://cloud.githubusercontent.com/assets/10060731/17342828/f2908be8-5935-11e6-8eee-b189677c0531.png) All the info are comes from [Jekyll YAML front matter](https://jekyllrb.com/docs/frontmatter/) variables. (i.e. title, group, description.. that's why I rewrote all docs' title and description.) [search.js](https://github.com/AhyoungRyu/zeppelin/blob/6e02423f541cc406e4e41031629609a276a9f481/docs/assets/themes/zeppelin/js/search.js) will do this job using this data! ### What type of PR is it? Improvement & Feature ### Todos * [x] - Keep consistency for all docs pages' `Title` * [x] - Add some overview sentences to all docs pages' `Description` section (this will be used as the result preview) * [x] - Add apache license header to all docs page (some pages are missing the license header currently) * [x] - Add LICENSE for `lunr.min.js` ### What is the Jira issue? [ZEPPELIN-1219](https://issues.apache.org/jira/browse/ZEPPELIN-1219) ### How should this be tested? 1. Apply this patch and build `ZEPPELIN_HOME/docs` dir -> please see [docs/README.md#build-documentation](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Click `search` icon in navbar and go to `search.html` page 3. Type anything you want to search in the search bar (i.e. type `python`, `spark`, `dynamic` ... ) ### Screenshots (if appropriate) ![screen shot 2016-08-03 at 4 42 28 pm](https://cloud.githubusercontent.com/assets/10060731/17357851/d092e2ca-5999-11e6-9917-a3d4113e6e43.png) ![search](https://cloud.githubusercontent.com/assets/10060731/17357828/b2486cd6-5999-11e6-873b-121fac033b03.gif) ### Questions: * Does the licenses files need update? Yes, for `lunr.min.js` * Is there breaking changes for older versions? no * Does this needs documentation? no Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1266 from AhyoungRyu/ZEPPELIN-1219 and squashes the following commits: 7ec8854 [AhyoungRyu] Modify 'no result' sentence 91b71a7 [AhyoungRyu] Remove Apache license header since JSON doesn't allow comment 34afd5d [AhyoungRyu] Add Apache license header to search_data.json 6784282 [AhyoungRyu] Minor search page UI update 0389d28 [AhyoungRyu] Make index.md not to be searched 9f1ba42 [AhyoungRyu] Disable enterkey press & change icon bd4956a [AhyoungRyu] Add docs.js & search.js to exclude list in pom.xml 624b051 [AhyoungRyu] Add Apache license header to search.js 1381152 [AhyoungRyu] Fix search result skipping issue 6e775f5 [AhyoungRyu] Make pleasecontribute.md not to be searched ee11136 [AhyoungRyu] Fix some typos fa01299 [AhyoungRyu] Refine 'description' in some docs as @bzz suggested da0cff9 [AhyoungRyu] Exclude lunr.min.js 36ba7f1 [AhyoungRyu] Add lunr.min.js license info f6a05a6 [AhyoungRyu] Apply css style for the search results 68eb997 [AhyoungRyu] Attach 'Apache Zeppelin ZEPPELIN_VERSION Documentation: ' to title d908c37 [AhyoungRyu] Add searching page a951fa6 [AhyoungRyu] Add search icon to navbar 0688a79 [AhyoungRyu] Keep consistency all docs' front matter for the right search result 040f532 [AhyoungRyu] Add template for storing docs info based on jekyll front matter 0705bd6 [AhyoungRyu] Add js files: lunr.min.js & search.js
2016-08-06 05:50:25 +00:00
title: "Apache Zeppelin Tutorial"
description: "This tutorial page contains a short walk-through tutorial that uses Apache Spark backend. Please note that this tutorial is valid for Spark 1.3 and higher."
[ZEPPELIN-952] Refine website style ### What is this PR for? - update document style (font, line-spacing) - apply same formats for documents - fix broke document styles ### What type of PR is it? Documentation ### What is the Jira issue? [ZEPPELIN-952](https://issues.apache.org/jira/browse/ZEPPELIN-952) ### Screenshots (if appropriate) **Before** <img width="1184" alt="screen shot 2016-06-04 at 9 51 38 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803667/d0dd5ac2-2a9f-11e6-9ed0-ddc369a97612.png"> **After** <img width="1184" alt="screen shot 2016-06-04 at 9 15 08 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803666/cd9212ea-2a9f-11e6-986e-17992a495ab6.png"> **Before** <img width="1183" alt="screen shot 2016-06-04 at 10 08 53 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803695/03e73126-2aa1-11e6-8675-3ca437aeb833.png"> **After** <img width="1184" alt="screen shot 2016-06-04 at 10 08 18 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803696/078ce866-2aa1-11e6-9044-4f5e16649eb4.png"> **Before** <img width="1184" alt="screen shot 2016-06-04 at 10 10 47 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803704/5787e9ba-2aa1-11e6-804c-076a8f3aa852.png"> **After** <img width="1184" alt="screen shot 2016-06-04 at 10 11 22 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803707/5afb5d0c-2aa1-11e6-98c7-7440db35bd2f.png"> **Before** <img width="188" alt="screen shot 2016-06-04 at 10 12 36 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803719/92e5cc3e-2aa1-11e6-9a9f-e12150e78733.png"> **After** <img width="199" alt="screen shot 2016-06-04 at 10 12 55 pm" src="https://cloud.githubusercontent.com/assets/8503346/15803721/958e8c00-2aa1-11e6-8768-8350db6e7173.png"> ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Mina Lee <minalee@nflabs.com> Closes #962 from minahlee/ZEPPELIN-952 and squashes the following commits: f9bee91 [Mina Lee] Capitalize hawq 72481bd [Mina Lee] Update doc titles 495a074 [Mina Lee] remove old style.css 27ca869 [Mina Lee] use code block for file location in spark.md eb821f1 [Mina Lee] Change file location and rename file 72f8ec3 [Mina Lee] change storage doc layout and fix pre block 4202208 [Mina Lee] Apply same format for rest api docs 5875066 [Mina Lee] split display page into text and html 8bc5a6e [Mina Lee] prettify document 0cb953e [Mina Lee] remove incubating tag
2016-06-05 04:54:02 +00:00
group: quickstart
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
[ZEPPELIN-1219] Add searching feature to Zeppelin docs site ### What is this PR for? As more and more document pages are added, it's really hard to find specific pages. So I added searching feature to Zeppelin documentation site([jekyll](https://jekyllrb.com/) based site) using [lunr.js](http://lunrjs.com/). - **How does it work?** I created [`search_data.json`](https://github.com/AhyoungRyu/zeppelin/blob/6e02423f541cc406e4e41031629609a276a9f481/docs/search_data.json) which is used for docs info template. `lunr.js` combines all of the text from all of the docs in `docs/` into `_site/search_data.json`. It looks like below. ![screen shot 2016-08-03 at 4 49 59 am](https://cloud.githubusercontent.com/assets/10060731/17342828/f2908be8-5935-11e6-8eee-b189677c0531.png) All the info are comes from [Jekyll YAML front matter](https://jekyllrb.com/docs/frontmatter/) variables. (i.e. title, group, description.. that's why I rewrote all docs' title and description.) [search.js](https://github.com/AhyoungRyu/zeppelin/blob/6e02423f541cc406e4e41031629609a276a9f481/docs/assets/themes/zeppelin/js/search.js) will do this job using this data! ### What type of PR is it? Improvement & Feature ### Todos * [x] - Keep consistency for all docs pages' `Title` * [x] - Add some overview sentences to all docs pages' `Description` section (this will be used as the result preview) * [x] - Add apache license header to all docs page (some pages are missing the license header currently) * [x] - Add LICENSE for `lunr.min.js` ### What is the Jira issue? [ZEPPELIN-1219](https://issues.apache.org/jira/browse/ZEPPELIN-1219) ### How should this be tested? 1. Apply this patch and build `ZEPPELIN_HOME/docs` dir -> please see [docs/README.md#build-documentation](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Click `search` icon in navbar and go to `search.html` page 3. Type anything you want to search in the search bar (i.e. type `python`, `spark`, `dynamic` ... ) ### Screenshots (if appropriate) ![screen shot 2016-08-03 at 4 42 28 pm](https://cloud.githubusercontent.com/assets/10060731/17357851/d092e2ca-5999-11e6-9917-a3d4113e6e43.png) ![search](https://cloud.githubusercontent.com/assets/10060731/17357828/b2486cd6-5999-11e6-873b-121fac033b03.gif) ### Questions: * Does the licenses files need update? Yes, for `lunr.min.js` * Is there breaking changes for older versions? no * Does this needs documentation? no Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1266 from AhyoungRyu/ZEPPELIN-1219 and squashes the following commits: 7ec8854 [AhyoungRyu] Modify 'no result' sentence 91b71a7 [AhyoungRyu] Remove Apache license header since JSON doesn't allow comment 34afd5d [AhyoungRyu] Add Apache license header to search_data.json 6784282 [AhyoungRyu] Minor search page UI update 0389d28 [AhyoungRyu] Make index.md not to be searched 9f1ba42 [AhyoungRyu] Disable enterkey press & change icon bd4956a [AhyoungRyu] Add docs.js & search.js to exclude list in pom.xml 624b051 [AhyoungRyu] Add Apache license header to search.js 1381152 [AhyoungRyu] Fix search result skipping issue 6e775f5 [AhyoungRyu] Make pleasecontribute.md not to be searched ee11136 [AhyoungRyu] Fix some typos fa01299 [AhyoungRyu] Refine 'description' in some docs as @bzz suggested da0cff9 [AhyoungRyu] Exclude lunr.min.js 36ba7f1 [AhyoungRyu] Add lunr.min.js license info f6a05a6 [AhyoungRyu] Apply css style for the search results 68eb997 [AhyoungRyu] Attach 'Apache Zeppelin ZEPPELIN_VERSION Documentation: ' to title d908c37 [AhyoungRyu] Add searching page a951fa6 [AhyoungRyu] Add search icon to navbar 0688a79 [AhyoungRyu] Keep consistency all docs' front matter for the right search result 040f532 [AhyoungRyu] Add template for storing docs info based on jekyll front matter 0705bd6 [AhyoungRyu] Add js files: lunr.min.js & search.js
2016-08-06 05:50:25 +00:00
{% include JB/setup %}
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website ### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one](https://github.com/apache/zeppelin/commit/d5e87fb8ba98f08db5b0a4995104ce19f182c678) and [this one](https://github.com/apache/zeppelin/commit/7d6cc7e99154e2d337c11fdf8be1a874ed3e9ada) ) * [x] - Apply TOC to `install.md` after #1010 merged * [x] - Apply TOC to `interpreterinstallation.md` after #1042 merged ### What is the Jira issue? [ZEPPELIN-1018](https://issues.apache.org/jira/browse/ZEPPELIN-1018) ### How should this be tested? 1. Apply this patch and build `docs/` with [this guide](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Visit some docs page. Then you can see TOC in the header of page. ### Screenshots (if appropriate) - Automatically generated TOC in Spark interpreter docs page <img width="831" alt="screen shot 2016-06-16 at 9 37 18 pm" src="https://cloud.githubusercontent.com/assets/10060731/16140902/945b9c7a-340a-11e6-91f3-b6174738bed0.png"> ### Questions: * Does the licenses files need update? No. Actually I used [jekyll-table-of-contents#copyright](https://github.com/ghiculescu/jekyll-table-of-contents#copyright). But I don't need to add a license for this :) * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1031 from AhyoungRyu/ZEPPELIN-1018 and squashes the following commits: e66397b [AhyoungRyu] Apply TOC to interpreterinstallation.md 009579b [AhyoungRyu] Add more info to 'What is the next?' in install.md 04cf501 [AhyoungRyu] Revert 'where to start' section b7cbe5f [AhyoungRyu] Fix typo cf0911c [AhyoungRyu] Rename license file 388f35a [AhyoungRyu] Add jekyll-table-of-contents license info 6394c70 [AhyoungRyu] Fix image path in python.md d00e4b1 [AhyoungRyu] Move interpreter/screenshot/ -> asset/../img/docs-img/ 3ffb383 [AhyoungRyu] Remove duplicated info in r.md & apply toc a03ca99 [AhyoungRyu] Exclude toc.js from pom.xml 3fae7df [AhyoungRyu] Apply auto generated toc to install.md d114a9d [AhyoungRyu] Address @felixcheung feedback 6a788fe [AhyoungRyu] Resize TOC tab indent 6760c00 [AhyoungRyu] Apply auto TOC to all of docs under docs/storage/ fbde57f [AhyoungRyu] Apply auto TOC to all of docs under docs/quickstart/ db76eb6 [AhyoungRyu] Apply auto TOC to all of docs under docs/install/ f35db47 [AhyoungRyu] Apply auto TOC to all of docs under docs/displaysystem/ b05365f [AhyoungRyu] Apply auto TOC to all of docs under docs/rest-api/ 163691c [AhyoungRyu] Apply auto TOC to all of docs under docs/manual/ bef398e [AhyoungRyu] Apply auto TOC to all of docs under docs/development/ 9c5f76b [AhyoungRyu] Apply auto TOC to all of docs under docs/interpreter/ 587d4ba [AhyoungRyu] Apply auto TOC to all of docs under docs/security/ 1f10b97 [AhyoungRyu] Change toc configuration 78dca9e [AhyoungRyu] Add toc.js for auto generating TOC
2016-06-25 19:44:53 +00:00
# Zeppelin Tutorial
<div id="toc"></div>
[ZEPPELIN-2596] Improving documentation page ### What is this PR for? Improving documentation page. Please check *TODO* and *Screenshots* sections for detail. The motivation is described in [the JIRA ticket](https://issues.apache.org/jira/browse/ZEPPELIN-2583) and discussion is ongoing on the mailing list. ### What type of PR is it? [Improvement | Documentation] ### Todos * [x] - improved the navbar style * [x] - improved the main page * [x] - re-organized content structure * [x] - added tutorial pages: `spark_with_zeppelin.md`, `python_with_zeppelin.md`, `sql_with_zeppelin.md` for overview * [x] - added `multi_user_support.md` page to provide overview * [x] - added the empty `interpreter_binding_mode` page. This will be handed in the different issue: [ZEPPELIN-2582](https://issues.apache.org/jira/browse/ZEPPELIN-2582) * [x] - added the empty `trouble_shooting` page. This can be filled in the following PRs. * [x] - added the empty `useful_developer_tools` page. This can be filled in the following PRs. ### What is the Jira issue? [ZEPPELIN-2596](https://issues.apache.org/jira/browse/ZEPPELIN-2596) ### How should this be tested? 1. checkout 2. `cd docs` 3. `bundle install` (make sure that you have ruby 2.1.0+ and bundle) 4. `bundle exec jekyll serve --watch` 5. open `localhost:4000` ### Screenshots (if appropriate) #### better navbar: before ![2596_before_nav](https://cloud.githubusercontent.com/assets/4968473/26542353/89004e7a-4494-11e7-89c0-28d608f5f375.gif) #### better navbar: after ![2596_after_nav](https://cloud.githubusercontent.com/assets/4968473/26542356/8bfb7b90-4494-11e7-9979-0bcaef8ba97b.gif) #### improved main page: before ![2596_before_main](https://cloud.githubusercontent.com/assets/4968473/26542358/8f35b0be-4494-11e7-8a6c-e74ec52fc384.gif) #### improved main page: after ![2596_after_main](https://cloud.githubusercontent.com/assets/4968473/26542366/93b333c8-4494-11e7-981f-3f7b4545868f.gif) #### organized content structure: before ![2596_before_content](https://cloud.githubusercontent.com/assets/4968473/26542398/ad81ac26-4494-11e7-9a17-70dff41396fb.gif) #### organized content structure: after ![2596_after_content](https://cloud.githubusercontent.com/assets/4968473/26542403/b0a42ad2-4494-11e7-8bd3-8a5bd194c6af.gif) ### Questions: * Does the licenses files need update? - NO * Is there breaking changes for older versions? - NO * Does this needs documentation? - related with docs Author: 1ambda <1amb4a@gmail.com> Closes #2371 from 1ambda/updating-version-doc and squashes the following commits: eb02fa967 [1ambda] fix: navbar focus color applies after folding 026379ed6 [1ambda] fix: Remove docs/.listen_test a7dd4737b [1ambda] fix: sora's comment 1.2 18c5058f7 [1ambda] fix: resolve description in python_with_zeppelin.md d3ad67c73 [1ambda] fix: sora's comment 4 d133dbbcc [1ambda] fix: resolve sora's comment 3 513c6ff2c [1ambda] fix: resolve sora's comment 1.1 4c2946928 [1ambda] fix: resovle sora's comment 2 1c3946ac6 [1ambda] fix: sora's comment 1 4d6e4267f [1ambda] fix: Resolve sola's comment 3 d0524cafe [1ambda] fix: Set less shadow for nav 5f1f998ba [1ambda] docs: Add useful_develop_tools.md 9dfd62c74 [1ambda] fix: Typo in installation.md 30f7d7e06 [1ambda] fix: Typo in helium ctrl d6877e792 [1ambda] docs: Add python_with_zeppelin.md 7027e96c0 [1ambda] docs: Improve python conda, docker doc style e55b50a9d [1ambda] fix: Invalid URLs 75ddeeaff [1ambda] docs: replace URIs in interpreter 5b43993a4 [1ambda] docs: Add sql_with_zeppelin 053794e84 [1ambda] docs: Add spark_with_zeppelin.md d4d88b9c7 [1ambda] docs: Improve proxy doc b46cdd126 [1ambda] docs: Add empty interpreter_binding_mode.md 06fcb239e [1ambda] docs: Add empty personalized_mode.md 4991cf0a7 [1ambda] docs: Update upgrading.md 53142b7a0 [1ambda] fix: Simplify install.md 8a5c1e721 [1ambda] docs: Add multi_user_support.md 34095775e [1ambda] fix: Increase font size to 15px a03b04b33 [1ambda] fix: Remove sample text from trouble_shooting.md 199842590 [1ambda] fix: Remove docker doc link 66a2a7d26 [1ambda] docs: Improve impersonation page 0a6e3fc1d [1ambda] docs: Improve install doc ccd999ed5 [1ambda] docs: Improve helium doc f8d742d08 [1ambda] fix: an invalid link in navbar b7aa5f884 [1ambda] fix: URLs in development 61a175d94 [1ambda] docs: Update install.md 4c56de5c4 [1ambda] fix: URLs in setup 0b1d63513 [1ambda] fix: URLs in quickstart 28970a4fe [1ambda] feat: Add docs/usage 735946bca [1ambda] feat: rename /quickstart b351cf237 [1ambda] fix: Add missing links b70770b4f [1ambda] feat: Change URLs in nav, index 94e80aef6 [1ambda] fix: doens't display navbar version in small 6e0cab110 [1ambda] feat: Update doc section names b9ce256ff [1ambda] feat: Hide version in navbar when md f8bab52be [1ambda] fix: Better image display in index.md eeb37d5b5 [1ambda] fix: Add RL padding for mobile browser ceb60b5ee [1ambda] feat: Style collapsed nav for mobile browser 4ebafb4b6 [1ambda] commit
2017-06-19 10:13:57 +00:00
This tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see [here](./install.html) first.
Fix some typos, grammars and Increase readability documentation ### What is this PR for? It would be better if Zeppelin provides looking good documentation. ### What type of PR is it? Improvement ### Todos * [x] - Fix typos, grammars and Increase readability writingzeppelininterpreter.md * [x] - Also spark.md * [x] - tutorial.md * [x] - cassandra.md * [x] - elasticsearch.md * [x] - flink.md * [x] - hive.md * [x] - lens.md * [x] - markdown.md * [x] - rest-interpreter.md ### Is there a relevant Jira issue? No : ) ### How should this be tested? These documentations are in [here](http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/). ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Ryu Ah young <fbdkdud93@hanmail.net> Closes #578 from AhyoungRyu/Fix-Typo and squashes the following commits: 925f30e [Ryu Ah young] Fix the list of 'Configure your Interpreter' 9b768e9 [Ryu Ah young] Change .md -> .html in interpreter.md 785dcbb [Ryu Ah young] Fix odd space in spark.md a5d26d7 [Ryu Ah young] Fix version 1.5.1 -> 1.5.2 in spark.md ef77027 [Ryu Ah young] Fix Upper case -> Lower case in cassandra.md 0918f79 [Ryu Ah young] Fix grammar error and Increase readability interpreters.md f3d0173 [Ryu Ah young] Add numbering for dividing the table list and Fix grammar error rest-interpreter.md fe4dfff [Ryu Ah young] Fix image size unit px -> % and Increase readability markdown.md 11b32db [Ryu Ah young] Remove useless 'to' in lens.md 3eec4cf [Ryu Ah young] Remove numbering flink.md aac2e01 [Ryu Ah young] Add short description about Apache hive and Increase readability hive.md d08092a [Ryu Ah young] Add highlight for scala code and Increase readability flink.md 8690e02 [Ryu Ah young] Increase readability and remove useless <hr> tag elasticsearch.md 9c008cf [Ryu Ah young] Increase readability and Remove useless <br>, <hr> tag cassandra.md 3041b95 [Ryu Ah young] Change rebuild -> rebuilding in spark.md 99b1e9e [Ryu Ah young] Remove unnecessary space in writingzeppelininterpreter.md c63e59f [Ryu Ah young] Increase readability and delete useless <br> tag tutorial.md e867df3 [Ryu Ah young] Increase readability and delete useless <br>, <hr> tag spark.md f72b1cb [Ryu Ah young] Add .sh next to .bin/zeppein-daemon 466f82c [Ryu Ah young] Fix some typos and grammars in writingzeppelininterpreter.md
2016-01-13 06:13:24 +00:00
Current main backend processing engine of Zeppelin is [Apache Spark](https://spark.apache.org). If you're new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.
Fix some typos, grammars and Increase readability documentation ### What is this PR for? It would be better if Zeppelin provides looking good documentation. ### What type of PR is it? Improvement ### Todos * [x] - Fix typos, grammars and Increase readability writingzeppelininterpreter.md * [x] - Also spark.md * [x] - tutorial.md * [x] - cassandra.md * [x] - elasticsearch.md * [x] - flink.md * [x] - hive.md * [x] - lens.md * [x] - markdown.md * [x] - rest-interpreter.md ### Is there a relevant Jira issue? No : ) ### How should this be tested? These documentations are in [here](http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/). ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Ryu Ah young <fbdkdud93@hanmail.net> Closes #578 from AhyoungRyu/Fix-Typo and squashes the following commits: 925f30e [Ryu Ah young] Fix the list of 'Configure your Interpreter' 9b768e9 [Ryu Ah young] Change .md -> .html in interpreter.md 785dcbb [Ryu Ah young] Fix odd space in spark.md a5d26d7 [Ryu Ah young] Fix version 1.5.1 -> 1.5.2 in spark.md ef77027 [Ryu Ah young] Fix Upper case -> Lower case in cassandra.md 0918f79 [Ryu Ah young] Fix grammar error and Increase readability interpreters.md f3d0173 [Ryu Ah young] Add numbering for dividing the table list and Fix grammar error rest-interpreter.md fe4dfff [Ryu Ah young] Fix image size unit px -> % and Increase readability markdown.md 11b32db [Ryu Ah young] Remove useless 'to' in lens.md 3eec4cf [Ryu Ah young] Remove numbering flink.md aac2e01 [Ryu Ah young] Add short description about Apache hive and Increase readability hive.md d08092a [Ryu Ah young] Add highlight for scala code and Increase readability flink.md 8690e02 [Ryu Ah young] Increase readability and remove useless <hr> tag elasticsearch.md 9c008cf [Ryu Ah young] Increase readability and Remove useless <br>, <hr> tag cassandra.md 3041b95 [Ryu Ah young] Change rebuild -> rebuilding in spark.md 99b1e9e [Ryu Ah young] Remove unnecessary space in writingzeppelininterpreter.md c63e59f [Ryu Ah young] Increase readability and delete useless <br> tag tutorial.md e867df3 [Ryu Ah young] Increase readability and delete useless <br>, <hr> tag spark.md f72b1cb [Ryu Ah young] Add .sh next to .bin/zeppein-daemon 466f82c [Ryu Ah young] Fix some typos and grammars in writingzeppelininterpreter.md
2016-01-13 06:13:24 +00:00
## Tutorial with Local File
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website ### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one](https://github.com/apache/zeppelin/commit/d5e87fb8ba98f08db5b0a4995104ce19f182c678) and [this one](https://github.com/apache/zeppelin/commit/7d6cc7e99154e2d337c11fdf8be1a874ed3e9ada) ) * [x] - Apply TOC to `install.md` after #1010 merged * [x] - Apply TOC to `interpreterinstallation.md` after #1042 merged ### What is the Jira issue? [ZEPPELIN-1018](https://issues.apache.org/jira/browse/ZEPPELIN-1018) ### How should this be tested? 1. Apply this patch and build `docs/` with [this guide](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Visit some docs page. Then you can see TOC in the header of page. ### Screenshots (if appropriate) - Automatically generated TOC in Spark interpreter docs page <img width="831" alt="screen shot 2016-06-16 at 9 37 18 pm" src="https://cloud.githubusercontent.com/assets/10060731/16140902/945b9c7a-340a-11e6-91f3-b6174738bed0.png"> ### Questions: * Does the licenses files need update? No. Actually I used [jekyll-table-of-contents#copyright](https://github.com/ghiculescu/jekyll-table-of-contents#copyright). But I don't need to add a license for this :) * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1031 from AhyoungRyu/ZEPPELIN-1018 and squashes the following commits: e66397b [AhyoungRyu] Apply TOC to interpreterinstallation.md 009579b [AhyoungRyu] Add more info to 'What is the next?' in install.md 04cf501 [AhyoungRyu] Revert 'where to start' section b7cbe5f [AhyoungRyu] Fix typo cf0911c [AhyoungRyu] Rename license file 388f35a [AhyoungRyu] Add jekyll-table-of-contents license info 6394c70 [AhyoungRyu] Fix image path in python.md d00e4b1 [AhyoungRyu] Move interpreter/screenshot/ -> asset/../img/docs-img/ 3ffb383 [AhyoungRyu] Remove duplicated info in r.md & apply toc a03ca99 [AhyoungRyu] Exclude toc.js from pom.xml 3fae7df [AhyoungRyu] Apply auto generated toc to install.md d114a9d [AhyoungRyu] Address @felixcheung feedback 6a788fe [AhyoungRyu] Resize TOC tab indent 6760c00 [AhyoungRyu] Apply auto TOC to all of docs under docs/storage/ fbde57f [AhyoungRyu] Apply auto TOC to all of docs under docs/quickstart/ db76eb6 [AhyoungRyu] Apply auto TOC to all of docs under docs/install/ f35db47 [AhyoungRyu] Apply auto TOC to all of docs under docs/displaysystem/ b05365f [AhyoungRyu] Apply auto TOC to all of docs under docs/rest-api/ 163691c [AhyoungRyu] Apply auto TOC to all of docs under docs/manual/ bef398e [AhyoungRyu] Apply auto TOC to all of docs under docs/development/ 9c5f76b [AhyoungRyu] Apply auto TOC to all of docs under docs/interpreter/ 587d4ba [AhyoungRyu] Apply auto TOC to all of docs under docs/security/ 1f10b97 [AhyoungRyu] Change toc configuration 78dca9e [AhyoungRyu] Add toc.js for auto generating TOC
2016-06-25 19:44:53 +00:00
### Data Refine
Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip).
Fix some typos, grammars and Increase readability documentation ### What is this PR for? It would be better if Zeppelin provides looking good documentation. ### What type of PR is it? Improvement ### Todos * [x] - Fix typos, grammars and Increase readability writingzeppelininterpreter.md * [x] - Also spark.md * [x] - tutorial.md * [x] - cassandra.md * [x] - elasticsearch.md * [x] - flink.md * [x] - hive.md * [x] - lens.md * [x] - markdown.md * [x] - rest-interpreter.md ### Is there a relevant Jira issue? No : ) ### How should this be tested? These documentations are in [here](http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/). ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Ryu Ah young <fbdkdud93@hanmail.net> Closes #578 from AhyoungRyu/Fix-Typo and squashes the following commits: 925f30e [Ryu Ah young] Fix the list of 'Configure your Interpreter' 9b768e9 [Ryu Ah young] Change .md -> .html in interpreter.md 785dcbb [Ryu Ah young] Fix odd space in spark.md a5d26d7 [Ryu Ah young] Fix version 1.5.1 -> 1.5.2 in spark.md ef77027 [Ryu Ah young] Fix Upper case -> Lower case in cassandra.md 0918f79 [Ryu Ah young] Fix grammar error and Increase readability interpreters.md f3d0173 [Ryu Ah young] Add numbering for dividing the table list and Fix grammar error rest-interpreter.md fe4dfff [Ryu Ah young] Fix image size unit px -> % and Increase readability markdown.md 11b32db [Ryu Ah young] Remove useless 'to' in lens.md 3eec4cf [Ryu Ah young] Remove numbering flink.md aac2e01 [Ryu Ah young] Add short description about Apache hive and Increase readability hive.md d08092a [Ryu Ah young] Add highlight for scala code and Increase readability flink.md 8690e02 [Ryu Ah young] Increase readability and remove useless <hr> tag elasticsearch.md 9c008cf [Ryu Ah young] Increase readability and Remove useless <br>, <hr> tag cassandra.md 3041b95 [Ryu Ah young] Change rebuild -> rebuilding in spark.md 99b1e9e [Ryu Ah young] Remove unnecessary space in writingzeppelininterpreter.md c63e59f [Ryu Ah young] Increase readability and delete useless <br> tag tutorial.md e867df3 [Ryu Ah young] Increase readability and delete useless <br>, <hr> tag spark.md f72b1cb [Ryu Ah young] Add .sh next to .bin/zeppein-daemon 466f82c [Ryu Ah young] Fix some typos and grammars in writingzeppelininterpreter.md
2016-01-13 06:13:24 +00:00
First, to transform csv format data into RDD of `Bank` objects, run following script. This will also remove header using `filter` function.
```scala
val bankText = sc.textFile("yourPath/bank/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)
Fix some typos, grammars and Increase readability documentation ### What is this PR for? It would be better if Zeppelin provides looking good documentation. ### What type of PR is it? Improvement ### Todos * [x] - Fix typos, grammars and Increase readability writingzeppelininterpreter.md * [x] - Also spark.md * [x] - tutorial.md * [x] - cassandra.md * [x] - elasticsearch.md * [x] - flink.md * [x] - hive.md * [x] - lens.md * [x] - markdown.md * [x] - rest-interpreter.md ### Is there a relevant Jira issue? No : ) ### How should this be tested? These documentations are in [here](http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/). ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Ryu Ah young <fbdkdud93@hanmail.net> Closes #578 from AhyoungRyu/Fix-Typo and squashes the following commits: 925f30e [Ryu Ah young] Fix the list of 'Configure your Interpreter' 9b768e9 [Ryu Ah young] Change .md -> .html in interpreter.md 785dcbb [Ryu Ah young] Fix odd space in spark.md a5d26d7 [Ryu Ah young] Fix version 1.5.1 -> 1.5.2 in spark.md ef77027 [Ryu Ah young] Fix Upper case -> Lower case in cassandra.md 0918f79 [Ryu Ah young] Fix grammar error and Increase readability interpreters.md f3d0173 [Ryu Ah young] Add numbering for dividing the table list and Fix grammar error rest-interpreter.md fe4dfff [Ryu Ah young] Fix image size unit px -> % and Increase readability markdown.md 11b32db [Ryu Ah young] Remove useless 'to' in lens.md 3eec4cf [Ryu Ah young] Remove numbering flink.md aac2e01 [Ryu Ah young] Add short description about Apache hive and Increase readability hive.md d08092a [Ryu Ah young] Add highlight for scala code and Increase readability flink.md 8690e02 [Ryu Ah young] Increase readability and remove useless <hr> tag elasticsearch.md 9c008cf [Ryu Ah young] Increase readability and Remove useless <br>, <hr> tag cassandra.md 3041b95 [Ryu Ah young] Change rebuild -> rebuilding in spark.md 99b1e9e [Ryu Ah young] Remove unnecessary space in writingzeppelininterpreter.md c63e59f [Ryu Ah young] Increase readability and delete useless <br> tag tutorial.md e867df3 [Ryu Ah young] Increase readability and delete useless <br>, <hr> tag spark.md f72b1cb [Ryu Ah young] Add .sh next to .bin/zeppein-daemon 466f82c [Ryu Ah young] Fix some typos and grammars in writingzeppelininterpreter.md
2016-01-13 06:13:24 +00:00
// split each line, filter out header (starts with "age"), and map it into Bank case class
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
)
// convert to DataFrame and create temporal table
bank.toDF().registerTempTable("bank")
```
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website ### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one](https://github.com/apache/zeppelin/commit/d5e87fb8ba98f08db5b0a4995104ce19f182c678) and [this one](https://github.com/apache/zeppelin/commit/7d6cc7e99154e2d337c11fdf8be1a874ed3e9ada) ) * [x] - Apply TOC to `install.md` after #1010 merged * [x] - Apply TOC to `interpreterinstallation.md` after #1042 merged ### What is the Jira issue? [ZEPPELIN-1018](https://issues.apache.org/jira/browse/ZEPPELIN-1018) ### How should this be tested? 1. Apply this patch and build `docs/` with [this guide](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Visit some docs page. Then you can see TOC in the header of page. ### Screenshots (if appropriate) - Automatically generated TOC in Spark interpreter docs page <img width="831" alt="screen shot 2016-06-16 at 9 37 18 pm" src="https://cloud.githubusercontent.com/assets/10060731/16140902/945b9c7a-340a-11e6-91f3-b6174738bed0.png"> ### Questions: * Does the licenses files need update? No. Actually I used [jekyll-table-of-contents#copyright](https://github.com/ghiculescu/jekyll-table-of-contents#copyright). But I don't need to add a license for this :) * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1031 from AhyoungRyu/ZEPPELIN-1018 and squashes the following commits: e66397b [AhyoungRyu] Apply TOC to interpreterinstallation.md 009579b [AhyoungRyu] Add more info to 'What is the next?' in install.md 04cf501 [AhyoungRyu] Revert 'where to start' section b7cbe5f [AhyoungRyu] Fix typo cf0911c [AhyoungRyu] Rename license file 388f35a [AhyoungRyu] Add jekyll-table-of-contents license info 6394c70 [AhyoungRyu] Fix image path in python.md d00e4b1 [AhyoungRyu] Move interpreter/screenshot/ -> asset/../img/docs-img/ 3ffb383 [AhyoungRyu] Remove duplicated info in r.md & apply toc a03ca99 [AhyoungRyu] Exclude toc.js from pom.xml 3fae7df [AhyoungRyu] Apply auto generated toc to install.md d114a9d [AhyoungRyu] Address @felixcheung feedback 6a788fe [AhyoungRyu] Resize TOC tab indent 6760c00 [AhyoungRyu] Apply auto TOC to all of docs under docs/storage/ fbde57f [AhyoungRyu] Apply auto TOC to all of docs under docs/quickstart/ db76eb6 [AhyoungRyu] Apply auto TOC to all of docs under docs/install/ f35db47 [AhyoungRyu] Apply auto TOC to all of docs under docs/displaysystem/ b05365f [AhyoungRyu] Apply auto TOC to all of docs under docs/rest-api/ 163691c [AhyoungRyu] Apply auto TOC to all of docs under docs/manual/ bef398e [AhyoungRyu] Apply auto TOC to all of docs under docs/development/ 9c5f76b [AhyoungRyu] Apply auto TOC to all of docs under docs/interpreter/ 587d4ba [AhyoungRyu] Apply auto TOC to all of docs under docs/security/ 1f10b97 [AhyoungRyu] Change toc configuration 78dca9e [AhyoungRyu] Add toc.js for auto generating TOC
2016-06-25 19:44:53 +00:00
### Data Retrieval
Suppose we want to see age distribution from `bank`. To do this, run:
```sql
%sql select age, count(1) from bank where age < 30 group by age order by age
```
You can make input box for setting age condition by replacing `30` with `${maxAge=30}`.
```sql
%sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age
```
Now we want to see age distribution with certain marital status and add combo box to select marital status. Run:
```sql
%sql select age, count(1) from bank where marital="${marital=single,single|divorced|married}" group by age order by age
```
<br />
Fix some typos, grammars and Increase readability documentation ### What is this PR for? It would be better if Zeppelin provides looking good documentation. ### What type of PR is it? Improvement ### Todos * [x] - Fix typos, grammars and Increase readability writingzeppelininterpreter.md * [x] - Also spark.md * [x] - tutorial.md * [x] - cassandra.md * [x] - elasticsearch.md * [x] - flink.md * [x] - hive.md * [x] - lens.md * [x] - markdown.md * [x] - rest-interpreter.md ### Is there a relevant Jira issue? No : ) ### How should this be tested? These documentations are in [here](http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/). ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Ryu Ah young <fbdkdud93@hanmail.net> Closes #578 from AhyoungRyu/Fix-Typo and squashes the following commits: 925f30e [Ryu Ah young] Fix the list of 'Configure your Interpreter' 9b768e9 [Ryu Ah young] Change .md -> .html in interpreter.md 785dcbb [Ryu Ah young] Fix odd space in spark.md a5d26d7 [Ryu Ah young] Fix version 1.5.1 -> 1.5.2 in spark.md ef77027 [Ryu Ah young] Fix Upper case -> Lower case in cassandra.md 0918f79 [Ryu Ah young] Fix grammar error and Increase readability interpreters.md f3d0173 [Ryu Ah young] Add numbering for dividing the table list and Fix grammar error rest-interpreter.md fe4dfff [Ryu Ah young] Fix image size unit px -> % and Increase readability markdown.md 11b32db [Ryu Ah young] Remove useless 'to' in lens.md 3eec4cf [Ryu Ah young] Remove numbering flink.md aac2e01 [Ryu Ah young] Add short description about Apache hive and Increase readability hive.md d08092a [Ryu Ah young] Add highlight for scala code and Increase readability flink.md 8690e02 [Ryu Ah young] Increase readability and remove useless <hr> tag elasticsearch.md 9c008cf [Ryu Ah young] Increase readability and Remove useless <br>, <hr> tag cassandra.md 3041b95 [Ryu Ah young] Change rebuild -> rebuilding in spark.md 99b1e9e [Ryu Ah young] Remove unnecessary space in writingzeppelininterpreter.md c63e59f [Ryu Ah young] Increase readability and delete useless <br> tag tutorial.md e867df3 [Ryu Ah young] Increase readability and delete useless <br>, <hr> tag spark.md f72b1cb [Ryu Ah young] Add .sh next to .bin/zeppein-daemon 466f82c [Ryu Ah young] Fix some typos and grammars in writingzeppelininterpreter.md
2016-01-13 06:13:24 +00:00
## Tutorial with Streaming Data
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website ### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one](https://github.com/apache/zeppelin/commit/d5e87fb8ba98f08db5b0a4995104ce19f182c678) and [this one](https://github.com/apache/zeppelin/commit/7d6cc7e99154e2d337c11fdf8be1a874ed3e9ada) ) * [x] - Apply TOC to `install.md` after #1010 merged * [x] - Apply TOC to `interpreterinstallation.md` after #1042 merged ### What is the Jira issue? [ZEPPELIN-1018](https://issues.apache.org/jira/browse/ZEPPELIN-1018) ### How should this be tested? 1. Apply this patch and build `docs/` with [this guide](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Visit some docs page. Then you can see TOC in the header of page. ### Screenshots (if appropriate) - Automatically generated TOC in Spark interpreter docs page <img width="831" alt="screen shot 2016-06-16 at 9 37 18 pm" src="https://cloud.githubusercontent.com/assets/10060731/16140902/945b9c7a-340a-11e6-91f3-b6174738bed0.png"> ### Questions: * Does the licenses files need update? No. Actually I used [jekyll-table-of-contents#copyright](https://github.com/ghiculescu/jekyll-table-of-contents#copyright). But I don't need to add a license for this :) * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1031 from AhyoungRyu/ZEPPELIN-1018 and squashes the following commits: e66397b [AhyoungRyu] Apply TOC to interpreterinstallation.md 009579b [AhyoungRyu] Add more info to 'What is the next?' in install.md 04cf501 [AhyoungRyu] Revert 'where to start' section b7cbe5f [AhyoungRyu] Fix typo cf0911c [AhyoungRyu] Rename license file 388f35a [AhyoungRyu] Add jekyll-table-of-contents license info 6394c70 [AhyoungRyu] Fix image path in python.md d00e4b1 [AhyoungRyu] Move interpreter/screenshot/ -> asset/../img/docs-img/ 3ffb383 [AhyoungRyu] Remove duplicated info in r.md & apply toc a03ca99 [AhyoungRyu] Exclude toc.js from pom.xml 3fae7df [AhyoungRyu] Apply auto generated toc to install.md d114a9d [AhyoungRyu] Address @felixcheung feedback 6a788fe [AhyoungRyu] Resize TOC tab indent 6760c00 [AhyoungRyu] Apply auto TOC to all of docs under docs/storage/ fbde57f [AhyoungRyu] Apply auto TOC to all of docs under docs/quickstart/ db76eb6 [AhyoungRyu] Apply auto TOC to all of docs under docs/install/ f35db47 [AhyoungRyu] Apply auto TOC to all of docs under docs/displaysystem/ b05365f [AhyoungRyu] Apply auto TOC to all of docs under docs/rest-api/ 163691c [AhyoungRyu] Apply auto TOC to all of docs under docs/manual/ bef398e [AhyoungRyu] Apply auto TOC to all of docs under docs/development/ 9c5f76b [AhyoungRyu] Apply auto TOC to all of docs under docs/interpreter/ 587d4ba [AhyoungRyu] Apply auto TOC to all of docs under docs/security/ 1f10b97 [AhyoungRyu] Change toc configuration 78dca9e [AhyoungRyu] Add toc.js for auto generating TOC
2016-06-25 19:44:53 +00:00
### Data Refine
Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script.
This will create a RDD of `Tweet` objects and register these stream data as a table:
```scala
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.storage.StorageLevel
import scala.io.Source
import scala.collection.mutable.HashMap
import java.io.File
import org.apache.log4j.Logger
import org.apache.log4j.Level
import sys.process.stringSeqToProcess
/** Configures the Oauth Credentials for accessing Twitter */
def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) {
val configs = new HashMap[String, String] ++= Seq(
"apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret)
println("Configuring Twitter OAuth")
configs.foreach{ case(key, value) =>
if (value.trim.isEmpty) {
throw new Exception("Error setting authentication - value for " + key + " not set")
}
val fullKey = "twitter4j.oauth." + key.replace("api", "consumer")
System.setProperty(fullKey, value.trim)
println("\tProperty " + fullKey + " set as [" + value.trim + "]")
}
println()
}
// Configure Twitter credentials
val apiKey = "xxxxxxxxxxxxxxxxxxxxxxxxx"
val apiSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
val accessToken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
val accessTokenSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)
import org.apache.spark.streaming.twitter._
val ssc = new StreamingContext(sc, Seconds(2))
val tweets = TwitterUtils.createStream(ssc, None)
val twt = tweets.window(Seconds(60))
case class Tweet(createdAt:Long, text:String)
twt.map(status=>
Tweet(status.getCreatedAt().getTime()/1000, status.getText())
).foreachRDD(rdd=>
// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use rdd.registerTempTable("tweets") instead.
rdd.toDF().registerAsTable("tweets")
)
twt.print
ssc.start()
```
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website ### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one](https://github.com/apache/zeppelin/commit/d5e87fb8ba98f08db5b0a4995104ce19f182c678) and [this one](https://github.com/apache/zeppelin/commit/7d6cc7e99154e2d337c11fdf8be1a874ed3e9ada) ) * [x] - Apply TOC to `install.md` after #1010 merged * [x] - Apply TOC to `interpreterinstallation.md` after #1042 merged ### What is the Jira issue? [ZEPPELIN-1018](https://issues.apache.org/jira/browse/ZEPPELIN-1018) ### How should this be tested? 1. Apply this patch and build `docs/` with [this guide](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Visit some docs page. Then you can see TOC in the header of page. ### Screenshots (if appropriate) - Automatically generated TOC in Spark interpreter docs page <img width="831" alt="screen shot 2016-06-16 at 9 37 18 pm" src="https://cloud.githubusercontent.com/assets/10060731/16140902/945b9c7a-340a-11e6-91f3-b6174738bed0.png"> ### Questions: * Does the licenses files need update? No. Actually I used [jekyll-table-of-contents#copyright](https://github.com/ghiculescu/jekyll-table-of-contents#copyright). But I don't need to add a license for this :) * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1031 from AhyoungRyu/ZEPPELIN-1018 and squashes the following commits: e66397b [AhyoungRyu] Apply TOC to interpreterinstallation.md 009579b [AhyoungRyu] Add more info to 'What is the next?' in install.md 04cf501 [AhyoungRyu] Revert 'where to start' section b7cbe5f [AhyoungRyu] Fix typo cf0911c [AhyoungRyu] Rename license file 388f35a [AhyoungRyu] Add jekyll-table-of-contents license info 6394c70 [AhyoungRyu] Fix image path in python.md d00e4b1 [AhyoungRyu] Move interpreter/screenshot/ -> asset/../img/docs-img/ 3ffb383 [AhyoungRyu] Remove duplicated info in r.md & apply toc a03ca99 [AhyoungRyu] Exclude toc.js from pom.xml 3fae7df [AhyoungRyu] Apply auto generated toc to install.md d114a9d [AhyoungRyu] Address @felixcheung feedback 6a788fe [AhyoungRyu] Resize TOC tab indent 6760c00 [AhyoungRyu] Apply auto TOC to all of docs under docs/storage/ fbde57f [AhyoungRyu] Apply auto TOC to all of docs under docs/quickstart/ db76eb6 [AhyoungRyu] Apply auto TOC to all of docs under docs/install/ f35db47 [AhyoungRyu] Apply auto TOC to all of docs under docs/displaysystem/ b05365f [AhyoungRyu] Apply auto TOC to all of docs under docs/rest-api/ 163691c [AhyoungRyu] Apply auto TOC to all of docs under docs/manual/ bef398e [AhyoungRyu] Apply auto TOC to all of docs under docs/development/ 9c5f76b [AhyoungRyu] Apply auto TOC to all of docs under docs/interpreter/ 587d4ba [AhyoungRyu] Apply auto TOC to all of docs under docs/security/ 1f10b97 [AhyoungRyu] Change toc configuration 78dca9e [AhyoungRyu] Add toc.js for auto generating TOC
2016-06-25 19:44:53 +00:00
### Data Retrieval
For each following script, every time you click run button you will see different result since it is based on real-time data.
Fix some typos, grammars and Increase readability documentation ### What is this PR for? It would be better if Zeppelin provides looking good documentation. ### What type of PR is it? Improvement ### Todos * [x] - Fix typos, grammars and Increase readability writingzeppelininterpreter.md * [x] - Also spark.md * [x] - tutorial.md * [x] - cassandra.md * [x] - elasticsearch.md * [x] - flink.md * [x] - hive.md * [x] - lens.md * [x] - markdown.md * [x] - rest-interpreter.md ### Is there a relevant Jira issue? No : ) ### How should this be tested? These documentations are in [here](http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/). ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Ryu Ah young <fbdkdud93@hanmail.net> Closes #578 from AhyoungRyu/Fix-Typo and squashes the following commits: 925f30e [Ryu Ah young] Fix the list of 'Configure your Interpreter' 9b768e9 [Ryu Ah young] Change .md -> .html in interpreter.md 785dcbb [Ryu Ah young] Fix odd space in spark.md a5d26d7 [Ryu Ah young] Fix version 1.5.1 -> 1.5.2 in spark.md ef77027 [Ryu Ah young] Fix Upper case -> Lower case in cassandra.md 0918f79 [Ryu Ah young] Fix grammar error and Increase readability interpreters.md f3d0173 [Ryu Ah young] Add numbering for dividing the table list and Fix grammar error rest-interpreter.md fe4dfff [Ryu Ah young] Fix image size unit px -> % and Increase readability markdown.md 11b32db [Ryu Ah young] Remove useless 'to' in lens.md 3eec4cf [Ryu Ah young] Remove numbering flink.md aac2e01 [Ryu Ah young] Add short description about Apache hive and Increase readability hive.md d08092a [Ryu Ah young] Add highlight for scala code and Increase readability flink.md 8690e02 [Ryu Ah young] Increase readability and remove useless <hr> tag elasticsearch.md 9c008cf [Ryu Ah young] Increase readability and Remove useless <br>, <hr> tag cassandra.md 3041b95 [Ryu Ah young] Change rebuild -> rebuilding in spark.md 99b1e9e [Ryu Ah young] Remove unnecessary space in writingzeppelininterpreter.md c63e59f [Ryu Ah young] Increase readability and delete useless <br> tag tutorial.md e867df3 [Ryu Ah young] Increase readability and delete useless <br>, <hr> tag spark.md f72b1cb [Ryu Ah young] Add .sh next to .bin/zeppein-daemon 466f82c [Ryu Ah young] Fix some typos and grammars in writingzeppelininterpreter.md
2016-01-13 06:13:24 +00:00
Let's begin by extracting maximum 10 tweets which contain the word **girl**.
```sql
%sql select * from tweets where text like '%girl%' limit 10
```
This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run:
```sql
%sql select createdAt, count(1) from tweets group by createdAt order by createdAt
```
Fix some typos, grammars and Increase readability documentation ### What is this PR for? It would be better if Zeppelin provides looking good documentation. ### What type of PR is it? Improvement ### Todos * [x] - Fix typos, grammars and Increase readability writingzeppelininterpreter.md * [x] - Also spark.md * [x] - tutorial.md * [x] - cassandra.md * [x] - elasticsearch.md * [x] - flink.md * [x] - hive.md * [x] - lens.md * [x] - markdown.md * [x] - rest-interpreter.md ### Is there a relevant Jira issue? No : ) ### How should this be tested? These documentations are in [here](http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/). ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Ryu Ah young <fbdkdud93@hanmail.net> Closes #578 from AhyoungRyu/Fix-Typo and squashes the following commits: 925f30e [Ryu Ah young] Fix the list of 'Configure your Interpreter' 9b768e9 [Ryu Ah young] Change .md -> .html in interpreter.md 785dcbb [Ryu Ah young] Fix odd space in spark.md a5d26d7 [Ryu Ah young] Fix version 1.5.1 -> 1.5.2 in spark.md ef77027 [Ryu Ah young] Fix Upper case -> Lower case in cassandra.md 0918f79 [Ryu Ah young] Fix grammar error and Increase readability interpreters.md f3d0173 [Ryu Ah young] Add numbering for dividing the table list and Fix grammar error rest-interpreter.md fe4dfff [Ryu Ah young] Fix image size unit px -> % and Increase readability markdown.md 11b32db [Ryu Ah young] Remove useless 'to' in lens.md 3eec4cf [Ryu Ah young] Remove numbering flink.md aac2e01 [Ryu Ah young] Add short description about Apache hive and Increase readability hive.md d08092a [Ryu Ah young] Add highlight for scala code and Increase readability flink.md 8690e02 [Ryu Ah young] Increase readability and remove useless <hr> tag elasticsearch.md 9c008cf [Ryu Ah young] Increase readability and Remove useless <br>, <hr> tag cassandra.md 3041b95 [Ryu Ah young] Change rebuild -> rebuilding in spark.md 99b1e9e [Ryu Ah young] Remove unnecessary space in writingzeppelininterpreter.md c63e59f [Ryu Ah young] Increase readability and delete useless <br> tag tutorial.md e867df3 [Ryu Ah young] Increase readability and delete useless <br>, <hr> tag spark.md f72b1cb [Ryu Ah young] Add .sh next to .bin/zeppein-daemon 466f82c [Ryu Ah young] Fix some typos and grammars in writingzeppelininterpreter.md
2016-01-13 06:13:24 +00:00
You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter.
```scala
def sentiment(s:String) : String = {
val positive = Array("like", "love", "good", "great", "happy", "cool", "the", "one", "that")
val negative = Array("hate", "bad", "stupid", "is")
var st = 0;
val words = s.split(" ")
positive.foreach(p =>
words.foreach(w =>
if(p==w) st = st+1
)
)
negative.foreach(p=>
words.foreach(w=>
if(p==w) st = st-1
)
)
if(st>0)
"positivie"
else if(st<0)
"negative"
else
"neutral"
}
// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use sqlc.registerFunction("sentiment", sentiment _) instead.
sqlc.udf.register("sentiment", sentiment _)
```
To check how people think about girls using `sentiment` function we've made above, run this:
```sql
%sql select sentiment(text), count(1) from tweets where text like '%girl%' group by sentiment(text)
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website ### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one](https://github.com/apache/zeppelin/commit/d5e87fb8ba98f08db5b0a4995104ce19f182c678) and [this one](https://github.com/apache/zeppelin/commit/7d6cc7e99154e2d337c11fdf8be1a874ed3e9ada) ) * [x] - Apply TOC to `install.md` after #1010 merged * [x] - Apply TOC to `interpreterinstallation.md` after #1042 merged ### What is the Jira issue? [ZEPPELIN-1018](https://issues.apache.org/jira/browse/ZEPPELIN-1018) ### How should this be tested? 1. Apply this patch and build `docs/` with [this guide](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Visit some docs page. Then you can see TOC in the header of page. ### Screenshots (if appropriate) - Automatically generated TOC in Spark interpreter docs page <img width="831" alt="screen shot 2016-06-16 at 9 37 18 pm" src="https://cloud.githubusercontent.com/assets/10060731/16140902/945b9c7a-340a-11e6-91f3-b6174738bed0.png"> ### Questions: * Does the licenses files need update? No. Actually I used [jekyll-table-of-contents#copyright](https://github.com/ghiculescu/jekyll-table-of-contents#copyright). But I don't need to add a license for this :) * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1031 from AhyoungRyu/ZEPPELIN-1018 and squashes the following commits: e66397b [AhyoungRyu] Apply TOC to interpreterinstallation.md 009579b [AhyoungRyu] Add more info to 'What is the next?' in install.md 04cf501 [AhyoungRyu] Revert 'where to start' section b7cbe5f [AhyoungRyu] Fix typo cf0911c [AhyoungRyu] Rename license file 388f35a [AhyoungRyu] Add jekyll-table-of-contents license info 6394c70 [AhyoungRyu] Fix image path in python.md d00e4b1 [AhyoungRyu] Move interpreter/screenshot/ -> asset/../img/docs-img/ 3ffb383 [AhyoungRyu] Remove duplicated info in r.md & apply toc a03ca99 [AhyoungRyu] Exclude toc.js from pom.xml 3fae7df [AhyoungRyu] Apply auto generated toc to install.md d114a9d [AhyoungRyu] Address @felixcheung feedback 6a788fe [AhyoungRyu] Resize TOC tab indent 6760c00 [AhyoungRyu] Apply auto TOC to all of docs under docs/storage/ fbde57f [AhyoungRyu] Apply auto TOC to all of docs under docs/quickstart/ db76eb6 [AhyoungRyu] Apply auto TOC to all of docs under docs/install/ f35db47 [AhyoungRyu] Apply auto TOC to all of docs under docs/displaysystem/ b05365f [AhyoungRyu] Apply auto TOC to all of docs under docs/rest-api/ 163691c [AhyoungRyu] Apply auto TOC to all of docs under docs/manual/ bef398e [AhyoungRyu] Apply auto TOC to all of docs under docs/development/ 9c5f76b [AhyoungRyu] Apply auto TOC to all of docs under docs/interpreter/ 587d4ba [AhyoungRyu] Apply auto TOC to all of docs under docs/security/ 1f10b97 [AhyoungRyu] Change toc configuration 78dca9e [AhyoungRyu] Add toc.js for auto generating TOC
2016-06-25 19:44:53 +00:00
```