zeppelin/docs/interpreter/python.md

224 lines
7.1 KiB
Markdown
Raw Normal View History

[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
---
layout: page
[ZEPPELIN-1219] Add searching feature to Zeppelin docs site ### What is this PR for? As more and more document pages are added, it's really hard to find specific pages. So I added searching feature to Zeppelin documentation site([jekyll](https://jekyllrb.com/) based site) using [lunr.js](http://lunrjs.com/). - **How does it work?** I created [`search_data.json`](https://github.com/AhyoungRyu/zeppelin/blob/6e02423f541cc406e4e41031629609a276a9f481/docs/search_data.json) which is used for docs info template. `lunr.js` combines all of the text from all of the docs in `docs/` into `_site/search_data.json`. It looks like below. ![screen shot 2016-08-03 at 4 49 59 am](https://cloud.githubusercontent.com/assets/10060731/17342828/f2908be8-5935-11e6-8eee-b189677c0531.png) All the info are comes from [Jekyll YAML front matter](https://jekyllrb.com/docs/frontmatter/) variables. (i.e. title, group, description.. that's why I rewrote all docs' title and description.) [search.js](https://github.com/AhyoungRyu/zeppelin/blob/6e02423f541cc406e4e41031629609a276a9f481/docs/assets/themes/zeppelin/js/search.js) will do this job using this data! ### What type of PR is it? Improvement & Feature ### Todos * [x] - Keep consistency for all docs pages' `Title` * [x] - Add some overview sentences to all docs pages' `Description` section (this will be used as the result preview) * [x] - Add apache license header to all docs page (some pages are missing the license header currently) * [x] - Add LICENSE for `lunr.min.js` ### What is the Jira issue? [ZEPPELIN-1219](https://issues.apache.org/jira/browse/ZEPPELIN-1219) ### How should this be tested? 1. Apply this patch and build `ZEPPELIN_HOME/docs` dir -> please see [docs/README.md#build-documentation](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Click `search` icon in navbar and go to `search.html` page 3. Type anything you want to search in the search bar (i.e. type `python`, `spark`, `dynamic` ... ) ### Screenshots (if appropriate) ![screen shot 2016-08-03 at 4 42 28 pm](https://cloud.githubusercontent.com/assets/10060731/17357851/d092e2ca-5999-11e6-9917-a3d4113e6e43.png) ![search](https://cloud.githubusercontent.com/assets/10060731/17357828/b2486cd6-5999-11e6-873b-121fac033b03.gif) ### Questions: * Does the licenses files need update? Yes, for `lunr.min.js` * Is there breaking changes for older versions? no * Does this needs documentation? no Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1266 from AhyoungRyu/ZEPPELIN-1219 and squashes the following commits: 7ec8854 [AhyoungRyu] Modify 'no result' sentence 91b71a7 [AhyoungRyu] Remove Apache license header since JSON doesn't allow comment 34afd5d [AhyoungRyu] Add Apache license header to search_data.json 6784282 [AhyoungRyu] Minor search page UI update 0389d28 [AhyoungRyu] Make index.md not to be searched 9f1ba42 [AhyoungRyu] Disable enterkey press & change icon bd4956a [AhyoungRyu] Add docs.js & search.js to exclude list in pom.xml 624b051 [AhyoungRyu] Add Apache license header to search.js 1381152 [AhyoungRyu] Fix search result skipping issue 6e775f5 [AhyoungRyu] Make pleasecontribute.md not to be searched ee11136 [AhyoungRyu] Fix some typos fa01299 [AhyoungRyu] Refine 'description' in some docs as @bzz suggested da0cff9 [AhyoungRyu] Exclude lunr.min.js 36ba7f1 [AhyoungRyu] Add lunr.min.js license info f6a05a6 [AhyoungRyu] Apply css style for the search results 68eb997 [AhyoungRyu] Attach 'Apache Zeppelin ZEPPELIN_VERSION Documentation: ' to title d908c37 [AhyoungRyu] Add searching page a951fa6 [AhyoungRyu] Add search icon to navbar 0688a79 [AhyoungRyu] Keep consistency all docs' front matter for the right search result 040f532 [AhyoungRyu] Add template for storing docs info based on jekyll front matter 0705bd6 [AhyoungRyu] Add js files: lunr.min.js & search.js
2016-08-06 05:50:25 +00:00
title: "Python 2 & 3 Interpreter for Apache Zeppelin"
description: "Python is a programming language that lets you work quickly and integrate systems more effectively."
group: interpreter
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
---
[ZEPPELIN-1219] Add searching feature to Zeppelin docs site ### What is this PR for? As more and more document pages are added, it's really hard to find specific pages. So I added searching feature to Zeppelin documentation site([jekyll](https://jekyllrb.com/) based site) using [lunr.js](http://lunrjs.com/). - **How does it work?** I created [`search_data.json`](https://github.com/AhyoungRyu/zeppelin/blob/6e02423f541cc406e4e41031629609a276a9f481/docs/search_data.json) which is used for docs info template. `lunr.js` combines all of the text from all of the docs in `docs/` into `_site/search_data.json`. It looks like below. ![screen shot 2016-08-03 at 4 49 59 am](https://cloud.githubusercontent.com/assets/10060731/17342828/f2908be8-5935-11e6-8eee-b189677c0531.png) All the info are comes from [Jekyll YAML front matter](https://jekyllrb.com/docs/frontmatter/) variables. (i.e. title, group, description.. that's why I rewrote all docs' title and description.) [search.js](https://github.com/AhyoungRyu/zeppelin/blob/6e02423f541cc406e4e41031629609a276a9f481/docs/assets/themes/zeppelin/js/search.js) will do this job using this data! ### What type of PR is it? Improvement & Feature ### Todos * [x] - Keep consistency for all docs pages' `Title` * [x] - Add some overview sentences to all docs pages' `Description` section (this will be used as the result preview) * [x] - Add apache license header to all docs page (some pages are missing the license header currently) * [x] - Add LICENSE for `lunr.min.js` ### What is the Jira issue? [ZEPPELIN-1219](https://issues.apache.org/jira/browse/ZEPPELIN-1219) ### How should this be tested? 1. Apply this patch and build `ZEPPELIN_HOME/docs` dir -> please see [docs/README.md#build-documentation](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Click `search` icon in navbar and go to `search.html` page 3. Type anything you want to search in the search bar (i.e. type `python`, `spark`, `dynamic` ... ) ### Screenshots (if appropriate) ![screen shot 2016-08-03 at 4 42 28 pm](https://cloud.githubusercontent.com/assets/10060731/17357851/d092e2ca-5999-11e6-9917-a3d4113e6e43.png) ![search](https://cloud.githubusercontent.com/assets/10060731/17357828/b2486cd6-5999-11e6-873b-121fac033b03.gif) ### Questions: * Does the licenses files need update? Yes, for `lunr.min.js` * Is there breaking changes for older versions? no * Does this needs documentation? no Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1266 from AhyoungRyu/ZEPPELIN-1219 and squashes the following commits: 7ec8854 [AhyoungRyu] Modify 'no result' sentence 91b71a7 [AhyoungRyu] Remove Apache license header since JSON doesn't allow comment 34afd5d [AhyoungRyu] Add Apache license header to search_data.json 6784282 [AhyoungRyu] Minor search page UI update 0389d28 [AhyoungRyu] Make index.md not to be searched 9f1ba42 [AhyoungRyu] Disable enterkey press & change icon bd4956a [AhyoungRyu] Add docs.js & search.js to exclude list in pom.xml 624b051 [AhyoungRyu] Add Apache license header to search.js 1381152 [AhyoungRyu] Fix search result skipping issue 6e775f5 [AhyoungRyu] Make pleasecontribute.md not to be searched ee11136 [AhyoungRyu] Fix some typos fa01299 [AhyoungRyu] Refine 'description' in some docs as @bzz suggested da0cff9 [AhyoungRyu] Exclude lunr.min.js 36ba7f1 [AhyoungRyu] Add lunr.min.js license info f6a05a6 [AhyoungRyu] Apply css style for the search results 68eb997 [AhyoungRyu] Attach 'Apache Zeppelin ZEPPELIN_VERSION Documentation: ' to title d908c37 [AhyoungRyu] Add searching page a951fa6 [AhyoungRyu] Add search icon to navbar 0688a79 [AhyoungRyu] Keep consistency all docs' front matter for the right search result 040f532 [AhyoungRyu] Add template for storing docs info based on jekyll front matter 0705bd6 [AhyoungRyu] Add js files: lunr.min.js & search.js
2016-08-06 05:50:25 +00:00
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
{% include JB/setup %}
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website ### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one](https://github.com/apache/zeppelin/commit/d5e87fb8ba98f08db5b0a4995104ce19f182c678) and [this one](https://github.com/apache/zeppelin/commit/7d6cc7e99154e2d337c11fdf8be1a874ed3e9ada) ) * [x] - Apply TOC to `install.md` after #1010 merged * [x] - Apply TOC to `interpreterinstallation.md` after #1042 merged ### What is the Jira issue? [ZEPPELIN-1018](https://issues.apache.org/jira/browse/ZEPPELIN-1018) ### How should this be tested? 1. Apply this patch and build `docs/` with [this guide](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Visit some docs page. Then you can see TOC in the header of page. ### Screenshots (if appropriate) - Automatically generated TOC in Spark interpreter docs page <img width="831" alt="screen shot 2016-06-16 at 9 37 18 pm" src="https://cloud.githubusercontent.com/assets/10060731/16140902/945b9c7a-340a-11e6-91f3-b6174738bed0.png"> ### Questions: * Does the licenses files need update? No. Actually I used [jekyll-table-of-contents#copyright](https://github.com/ghiculescu/jekyll-table-of-contents#copyright). But I don't need to add a license for this :) * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1031 from AhyoungRyu/ZEPPELIN-1018 and squashes the following commits: e66397b [AhyoungRyu] Apply TOC to interpreterinstallation.md 009579b [AhyoungRyu] Add more info to 'What is the next?' in install.md 04cf501 [AhyoungRyu] Revert 'where to start' section b7cbe5f [AhyoungRyu] Fix typo cf0911c [AhyoungRyu] Rename license file 388f35a [AhyoungRyu] Add jekyll-table-of-contents license info 6394c70 [AhyoungRyu] Fix image path in python.md d00e4b1 [AhyoungRyu] Move interpreter/screenshot/ -> asset/../img/docs-img/ 3ffb383 [AhyoungRyu] Remove duplicated info in r.md & apply toc a03ca99 [AhyoungRyu] Exclude toc.js from pom.xml 3fae7df [AhyoungRyu] Apply auto generated toc to install.md d114a9d [AhyoungRyu] Address @felixcheung feedback 6a788fe [AhyoungRyu] Resize TOC tab indent 6760c00 [AhyoungRyu] Apply auto TOC to all of docs under docs/storage/ fbde57f [AhyoungRyu] Apply auto TOC to all of docs under docs/quickstart/ db76eb6 [AhyoungRyu] Apply auto TOC to all of docs under docs/install/ f35db47 [AhyoungRyu] Apply auto TOC to all of docs under docs/displaysystem/ b05365f [AhyoungRyu] Apply auto TOC to all of docs under docs/rest-api/ 163691c [AhyoungRyu] Apply auto TOC to all of docs under docs/manual/ bef398e [AhyoungRyu] Apply auto TOC to all of docs under docs/development/ 9c5f76b [AhyoungRyu] Apply auto TOC to all of docs under docs/interpreter/ 587d4ba [AhyoungRyu] Apply auto TOC to all of docs under docs/security/ 1f10b97 [AhyoungRyu] Change toc configuration 78dca9e [AhyoungRyu] Add toc.js for auto generating TOC
2016-06-25 19:44:53 +00:00
# Python 2 & 3 Interpreter for Apache Zeppelin
<div id="toc"></div>
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
## Configuration
<table class="table-configuration">
<tr>
<th>Property</th>
<th>Default</th>
<th>Description</th>
</tr>
<tr>
<td>zeppelin.python</td>
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
<td>python</td>
<td>Path of the already installed Python binary (could be python2 or python3).
If python is not in your $PATH you can set the absolute directory (example : /usr/bin/python)
</td>
</tr>
<tr>
<td>zeppelin.python.maxResult</td>
<td>1000</td>
<td>Max number of dataframe rows to display.</td>
</tr>
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
</table>
## Enabling Python Interpreter
In a notebook, to enable the **Python** interpreter, click on the **Gear** icon and select **Python**
## Using the Python Interpreter
In a paragraph, use **_%python_** to select the **Python** interpreter and then input all commands.
The interpreter can only work if you already have python installed (the interpreter doesn't bring it own python binaries).
To access the help, type **help()**
## Python environments
### Default
By default, PythonInterpreter will use python command defined in `zeppelin.python` property to run python process.
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
The interpreter can use all modules already installed (with pip, easy_install...)
### Conda
[Conda](http://conda.pydata.org/) is an package management system and environment management system for python.
`%python.conda` interpreter lets you change between environments.
#### Usage
List your environments
```
%python.conda
```
Activate an environment
```
%python.conda activate [ENVIRONMENT_NAME]
```
Deactivate
```
%python.conda deactivate
```
### Docker
`%python.docker` interpreter allows PythonInterpreter creates python process in a specified docker container.
#### Usage
Activate an environment
```
%python.docker activate [Repository]
%python.docker activate [Repository:Tag]
%python.docker activate [Image Id]
```
Deactivate
```
%python.docker deactivate
```
Example
```
# activate latest tensorflow image as a python environment
%python.docker activate gcr.io/tensorflow/tensorflow:latest
```
ZEPPELIN-1115: Python - interpreter for SQL over DataFrame ### What is this PR for? Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support ### What type of PR is it? Improvement ### TODOs * [x] add new interpreter `%python.sql` * [x] add test * [x] make Python-dependant tests, excluded from CI * PythonInterpreterWithPythonInstalledTest * PythonPandasSqlInterpreterTest * run manually by `mvn -Dpython.test.exclude='' test -pl python -am` * [x] add docs `%python.sql` * [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed * [x] after #747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles ### What is the Jira issue? [ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115) ### How should this be tested? `mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run - Given the DataFrame i.e ``` %python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") ``` - SQL query it like ``` %python.sql SELECT * FROM rates LIMIT 10 ``` ### Screenshots (if appropriate) ![screen shot 2016-07-11 at 23 56 04](https://cloud.githubusercontent.com/assets/5582506/16735171/1ebb9354-47c3-11e6-9354-6364e9374a20.png) ### Questions: * Does the licenses files need update? No, no dependencies were included in source or binary release * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits: 0f2f852 [Alexander Bezzubov] Fail SQL gracefully if no python dependencies installed aca2bdf [Alexander Bezzubov] Fix typos in docs :zap: 158ba6a [Alexander Bezzubov] Remove third-party dependant test from CI 5fe46fc [Alexander Bezzubov] Update Python Matplotlib notebook example 72884c8 [Alexander Bezzubov] Add docs for %python.sql feature e931dc4 [Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable 76bbb44 [Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreter f6ca1eb [Alexander Bezzubov] Add %python.sql to interpreter menue 11ba490 [Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
2016-07-14 05:15:42 +00:00
## Using Zeppelin Dynamic Forms
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your Python code.
**Zeppelin Dynamic Form can only be used if py4j Python library is installed in your system. If not, you can install it with `pip install py4j`.**
Example :
```python
%python
### Input form
print (z.input("f1","defaultValue"))
### Select form
print (z.select("f1",[("o1","1"),("o2","2")],"2"))
### Checkbox form
print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"])))
```
## Matplotlib integration
ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell ### What is this PR for? This PR is the first of two major steps needed to improve matplotlib integration in Zeppelin (ZEPPELIN-1344). The latter, which is a plotting backend with fully interactive tools enabled, will be done afterwards in a separate PR. This PR specifically for automatically displaying output from calls to matplotlib plotting functions inline with each paragraph. Thanks to the addition of post-execute hooks (ZEPPELIN-1423), there is no need to call any `show()` function to display an inline plot, just like in Jupyter. ### What type of PR is it? Improvement ### Todos The main code has been written and anyone who reads this is encouraged to test it, but there are a few minor todos: - [x] - Add unit tests - [x] - Add documentation - [x] - Add screenshot showing iterative plotting with angular mode ### What is the Jira issue? [ZEPPELIN-1345](https://issues.apache.org/jira/browse/ZEPPELIN-1345) ### How should this be tested? In a pyspark or python paragraph, enter and run ``` python import matplotlib.pyplot as plt plt.plot([1, 2, 3]) ``` The plot should be displayed automatically without calling any `show()` function whatsoever. A special method called `configure_mpl()` can also be used to modify the inline plotting behavior. For example, ``` python z.configure_mpl(close=False, angular=True) plt.plot([1, 2, 3]) ``` allows for iterative updates to the plot provided you have PY4J installed for your python installation (which of course is always the case if you use pypsark). To clarify, this feature only currently works with pyspark (not python as there are no `angularBind()` and `angularUnbind()` methods yet). Doing something like: ``` plt.plot([3, 2, 1]) ``` will update the plot that was generated by the previous paragraph by leveraging Zeppelin's Angular Display System. However, by setting `close=False`, matplotlib will no longer automatically close figures so it is now up to the user to explicitly close each figure instance they create. There's quite a bit more options for `z.configure_mpl()`, but I will save that discussion for the documentation. ### Screenshots (if appropriate) ![img](http://i.imgur.com/e1xHKnV.gif) ### Questions: - Does the licenses files need update? No - Is there breaking changes for older versions? No - Does this needs documentation? Yes Author: Alex Goodman <agoodm@users.noreply.github.com> Closes #1534 from agoodm/ZEPPELIN-1345 and squashes the following commits: 9ef6ff7 [Alex Goodman] Move mpl backend files to /interpreter 24f89c6 [Alex Goodman] Catch potential NullPointerExceptions from hook registry bdb584e [Alex Goodman] Make sure expressions are printed when no plots are shown 22b6fe4 [Alex Goodman] Remove unused variable d3d1aa0 [Alex Goodman] Fix CI test failure c90d204 [Alex Goodman] Update spark.md bcf0bf3 [Alex Goodman] Update python.md for new matplotlib integration c9b65a5 [Alex Goodman] Add iterative plotting example image 8029a05 [Alex Goodman] Update python/README.md f2d9e86 [Alex Goodman] Exclude tests are excluded in python/pom.xml 86b1c90 [Alex Goodman] Fix tutorial notebook not loading c37b00f [Alex Goodman] Fix legend in tutorial notebook a321d79 [Alex Goodman] Update python.md 82350e3 [Alex Goodman] Update matplotlib tutorial notebook 9792f97 [Alex Goodman] Add unit tests 8b9b973 [Alex Goodman] Fix NullPointerExceptions in unit tests 82135ad [Alex Goodman] Removed unused variable f9c9498 [Alex Goodman] Added support for Angular Display System edf750a [Alex Goodman] Add new matplotlib backend for python/pyspark interpreters
2016-11-06 06:03:04 +00:00
The python interpreter can display matplotlib figures inline automatically using the `pyplot` module:
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell ### What is this PR for? This PR is the first of two major steps needed to improve matplotlib integration in Zeppelin (ZEPPELIN-1344). The latter, which is a plotting backend with fully interactive tools enabled, will be done afterwards in a separate PR. This PR specifically for automatically displaying output from calls to matplotlib plotting functions inline with each paragraph. Thanks to the addition of post-execute hooks (ZEPPELIN-1423), there is no need to call any `show()` function to display an inline plot, just like in Jupyter. ### What type of PR is it? Improvement ### Todos The main code has been written and anyone who reads this is encouraged to test it, but there are a few minor todos: - [x] - Add unit tests - [x] - Add documentation - [x] - Add screenshot showing iterative plotting with angular mode ### What is the Jira issue? [ZEPPELIN-1345](https://issues.apache.org/jira/browse/ZEPPELIN-1345) ### How should this be tested? In a pyspark or python paragraph, enter and run ``` python import matplotlib.pyplot as plt plt.plot([1, 2, 3]) ``` The plot should be displayed automatically without calling any `show()` function whatsoever. A special method called `configure_mpl()` can also be used to modify the inline plotting behavior. For example, ``` python z.configure_mpl(close=False, angular=True) plt.plot([1, 2, 3]) ``` allows for iterative updates to the plot provided you have PY4J installed for your python installation (which of course is always the case if you use pypsark). To clarify, this feature only currently works with pyspark (not python as there are no `angularBind()` and `angularUnbind()` methods yet). Doing something like: ``` plt.plot([3, 2, 1]) ``` will update the plot that was generated by the previous paragraph by leveraging Zeppelin's Angular Display System. However, by setting `close=False`, matplotlib will no longer automatically close figures so it is now up to the user to explicitly close each figure instance they create. There's quite a bit more options for `z.configure_mpl()`, but I will save that discussion for the documentation. ### Screenshots (if appropriate) ![img](http://i.imgur.com/e1xHKnV.gif) ### Questions: - Does the licenses files need update? No - Is there breaking changes for older versions? No - Does this needs documentation? Yes Author: Alex Goodman <agoodm@users.noreply.github.com> Closes #1534 from agoodm/ZEPPELIN-1345 and squashes the following commits: 9ef6ff7 [Alex Goodman] Move mpl backend files to /interpreter 24f89c6 [Alex Goodman] Catch potential NullPointerExceptions from hook registry bdb584e [Alex Goodman] Make sure expressions are printed when no plots are shown 22b6fe4 [Alex Goodman] Remove unused variable d3d1aa0 [Alex Goodman] Fix CI test failure c90d204 [Alex Goodman] Update spark.md bcf0bf3 [Alex Goodman] Update python.md for new matplotlib integration c9b65a5 [Alex Goodman] Add iterative plotting example image 8029a05 [Alex Goodman] Update python/README.md f2d9e86 [Alex Goodman] Exclude tests are excluded in python/pom.xml 86b1c90 [Alex Goodman] Fix tutorial notebook not loading c37b00f [Alex Goodman] Fix legend in tutorial notebook a321d79 [Alex Goodman] Update python.md 82350e3 [Alex Goodman] Update matplotlib tutorial notebook 9792f97 [Alex Goodman] Add unit tests 8b9b973 [Alex Goodman] Fix NullPointerExceptions in unit tests 82135ad [Alex Goodman] Removed unused variable f9c9498 [Alex Goodman] Added support for Angular Display System edf750a [Alex Goodman] Add new matplotlib backend for python/pyspark interpreters
2016-11-06 06:03:04 +00:00
```python
%python
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
```
This is the recommended method for using matplotlib from within a Zeppelin notebook. The output of this command will by default be converted to HTML by implicitly making use of the `%html` magic. Additional configuration can be achieved using the builtin `z.configure_mpl()` method. For example,
```python
z.configure_mpl(width=400, height=300, fmt='svg')
plt.plot([1, 2, 3])
```
Will produce a 400x300 image in SVG format, which by default are normally 600x400 and PNG respectively. In the future, another option called `angular` can be used to make it possible to update a plot produced from one paragraph directly from another (the output will be `%angular` instead of `%html`). However, this feature is already available in the `pyspark` interpreter. More details can be found in the included "Zeppelin Tutorial: Python - matplotlib basic" tutorial notebook.
If Zeppelin cannot find the matplotlib backend files (which should usually be found in `$ZEPPELIN_HOME/interpreter/lib/python`) in your `PYTHONPATH`, then the backend will automatically be set to agg, and the (otherwise deprecated) instructions below can be used for more limited inline plotting.
If you are unable to load the inline backend, use `z.show(plt)`:
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
```python
%python
import matplotlib.pyplot as plt
plt.figure()
(.. ..)
ZEPPELIN-1048: Pandas support for python interpreter ### What is this PR for? Display Pandas DataFrame using Zeppelin's Table Display system. ### What type of PR is it? Feature ### Todos * [x] fix NPE in logs on empty paragraph execution * [x] matplotlib: refactor `zeppelin_show(plt)` -> `z.show(plt)` * [x] pandas: support `z.show(df)` * [x] update docs ### What is the Jira issue? [ZEPPELIN-1048](https://issues.apache.org/jira/browse/ZEPPELIN-1048) ### How should this be tested? "Zeppelin Tutorial: Python - matplotlib basic" should work, and ```python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") z.show(rates) ``` ### Screenshots (if appropriate) ![screen shot 2016-06-23 at 10 29 00](https://cloud.githubusercontent.com/assets/5582506/16289133/85f0ddbc-392d-11e6-86a3-28d10e73f68d.png) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1067 from bzz/python/pandas-support and squashes the following commits: 3b1ad36 [Alexander Bezzubov] Python: update docs to reffer new API ee6668b [Alexander Bezzubov] Python: update docs, add Pandas integration 71be418 [Alexander Bezzubov] Python: limit 1000 for table display system on DataFrame 52e787d [Alexander Bezzubov] Python: pandas DataFrame using Table display system bc91b86 [Alexander Bezzubov] Python: skip interpreting empty paragraphs a7248cd [Alexander Bezzubov] Python: draft of pandas support 15646a1 [Alexander Bezzubov] Python: refactoring to z.show()
2016-06-23 01:25:49 +00:00
z.show(plt)
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
plt.close()
```
ZEPPELIN-1318 - Add support for matplotlib displaying png images in python interpreter ### What is this PR for? This PR adds support for plotting png images using the matplotlib helper function within a python interpreter (eg `z.show()`). The primary motivation for this is due to the overhead incurred from svg images, which can lag the notebooks if multiple, complicated images are generated (for example, multiple filled contour plots). png images are more lightweight, but of course come at a cost of image quality due to them being raster rather than vector like svg. The support for png images is incorporated through the use of a new optional argument to `z.show` called `fmt` which can be one of `'svg'` or `'png'`. The same code that is currently used in show is used for svg images while the code for png images relies on converting the image directly to a byte array and then entering the decoded byte string directly into an HTML image tag. Currently `fmt` defaults to `'png'` but I think we should consider discussing the pros and cons of each option in this PR. ### What type of PR is it? Improvement ### What is the Jira issue? [ZEPPELIN-1318](https://issues.apache.org/jira/browse/ZEPPELIN-1318) ### How should this be tested? In a notebook cell, enter: ```python %python import matplotlib.pyplot as plt import numpy as np plt.figure() plt.plot(np.arange(10)) z.show(plt, fmt=fmt) ``` Where `fmt` may be one of `'svg'` or `'png'`, and any other input should result in a `ValueError`. I would also recommend testing the example in the screenshot below. ### Screenshots (if appropriate) ![zeppelin plot](https://puu.sh/qzc4t/b8fcfe856e.png) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes (if the changes to the `help()` docstring are not sufficient) Author: Alex Goodman <agoodm@users.noreply.github.com> Closes #1329 from agoodm/ZEPPELIN-1318 and squashes the following commits: 2e9ce4c [Alex Goodman] Update python.md 1efa0c9 [Alex Goodman] ZEPPELIN-1318 - Add support for png images in z.show()
2016-08-13 04:42:48 +00:00
The `z.show()` function can take optional parameters to adapt graph dimensions (width and height) as well as output format (png or optionally svg).
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
```python
%python
ZEPPELIN-1048: Pandas support for python interpreter ### What is this PR for? Display Pandas DataFrame using Zeppelin's Table Display system. ### What type of PR is it? Feature ### Todos * [x] fix NPE in logs on empty paragraph execution * [x] matplotlib: refactor `zeppelin_show(plt)` -> `z.show(plt)` * [x] pandas: support `z.show(df)` * [x] update docs ### What is the Jira issue? [ZEPPELIN-1048](https://issues.apache.org/jira/browse/ZEPPELIN-1048) ### How should this be tested? "Zeppelin Tutorial: Python - matplotlib basic" should work, and ```python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") z.show(rates) ``` ### Screenshots (if appropriate) ![screen shot 2016-06-23 at 10 29 00](https://cloud.githubusercontent.com/assets/5582506/16289133/85f0ddbc-392d-11e6-86a3-28d10e73f68d.png) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1067 from bzz/python/pandas-support and squashes the following commits: 3b1ad36 [Alexander Bezzubov] Python: update docs to reffer new API ee6668b [Alexander Bezzubov] Python: update docs, add Pandas integration 71be418 [Alexander Bezzubov] Python: limit 1000 for table display system on DataFrame 52e787d [Alexander Bezzubov] Python: pandas DataFrame using Table display system bc91b86 [Alexander Bezzubov] Python: skip interpreting empty paragraphs a7248cd [Alexander Bezzubov] Python: draft of pandas support 15646a1 [Alexander Bezzubov] Python: refactoring to z.show()
2016-06-23 01:25:49 +00:00
z.show(plt, width='50px')
ZEPPELIN-1318 - Add support for matplotlib displaying png images in python interpreter ### What is this PR for? This PR adds support for plotting png images using the matplotlib helper function within a python interpreter (eg `z.show()`). The primary motivation for this is due to the overhead incurred from svg images, which can lag the notebooks if multiple, complicated images are generated (for example, multiple filled contour plots). png images are more lightweight, but of course come at a cost of image quality due to them being raster rather than vector like svg. The support for png images is incorporated through the use of a new optional argument to `z.show` called `fmt` which can be one of `'svg'` or `'png'`. The same code that is currently used in show is used for svg images while the code for png images relies on converting the image directly to a byte array and then entering the decoded byte string directly into an HTML image tag. Currently `fmt` defaults to `'png'` but I think we should consider discussing the pros and cons of each option in this PR. ### What type of PR is it? Improvement ### What is the Jira issue? [ZEPPELIN-1318](https://issues.apache.org/jira/browse/ZEPPELIN-1318) ### How should this be tested? In a notebook cell, enter: ```python %python import matplotlib.pyplot as plt import numpy as np plt.figure() plt.plot(np.arange(10)) z.show(plt, fmt=fmt) ``` Where `fmt` may be one of `'svg'` or `'png'`, and any other input should result in a `ValueError`. I would also recommend testing the example in the screenshot below. ### Screenshots (if appropriate) ![zeppelin plot](https://puu.sh/qzc4t/b8fcfe856e.png) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes (if the changes to the `help()` docstring are not sufficient) Author: Alex Goodman <agoodm@users.noreply.github.com> Closes #1329 from agoodm/ZEPPELIN-1318 and squashes the following commits: 2e9ce4c [Alex Goodman] Update python.md 1efa0c9 [Alex Goodman] ZEPPELIN-1318 - Add support for png images in z.show()
2016-08-13 04:42:48 +00:00
z.show(plt, height='150px', fmt='svg')
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
```
[ZEPPELIN-1018] Apply auto "Table of Contents" generator to Zeppelin docs website ### What is this PR for? I added auto TOC(Table of Contents) generator for Zeppelin documentation website. TOC can help people looking through whole contents at a glance and finding what they want quickly. I just added `<div id="toc"></div>` to the each documentation header. [`toc`](https://github.com/apache/zeppelin/compare/master...AhyoungRyu:ZEPPELIN-1018?expand=1#diff-85af09fb498a5667ea455391533f945dR3) recognize `<h2>` & `<h3>` as a title in the docs and it automatically generate TOC. So I set a rule for this work. (I'll write this rule on `docs/CONTRIBUTING.md` or [docs/howtocontributewebsite](https://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/development/howtocontributewebsite.html)). ``` # Level-1 Heading <- Use only for the main title of the page ## Level-2 Heading <- Start with this one ### Level-3 heading <- Only use this one for child of Level-2 toc only recognize Level-2 & Level-3 ``` Please see the below attached screenshot image. ### What type of PR is it? Improvement & Documentation ### Todos * [x] - Add TOC generator * [x] - Apply TOC(`<div id="toc"></div>`) to every documentation and reorganize each headers(apply the above rule) * [x] - Fix some broken code block in several docs * [x] - Apply TOC to `r.md` (Currently R docs has some duplicated info since [this one](https://github.com/apache/zeppelin/commit/d5e87fb8ba98f08db5b0a4995104ce19f182c678) and [this one](https://github.com/apache/zeppelin/commit/7d6cc7e99154e2d337c11fdf8be1a874ed3e9ada) ) * [x] - Apply TOC to `install.md` after #1010 merged * [x] - Apply TOC to `interpreterinstallation.md` after #1042 merged ### What is the Jira issue? [ZEPPELIN-1018](https://issues.apache.org/jira/browse/ZEPPELIN-1018) ### How should this be tested? 1. Apply this patch and build `docs/` with [this guide](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Visit some docs page. Then you can see TOC in the header of page. ### Screenshots (if appropriate) - Automatically generated TOC in Spark interpreter docs page <img width="831" alt="screen shot 2016-06-16 at 9 37 18 pm" src="https://cloud.githubusercontent.com/assets/10060731/16140902/945b9c7a-340a-11e6-91f3-b6174738bed0.png"> ### Questions: * Does the licenses files need update? No. Actually I used [jekyll-table-of-contents#copyright](https://github.com/ghiculescu/jekyll-table-of-contents#copyright). But I don't need to add a license for this :) * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1031 from AhyoungRyu/ZEPPELIN-1018 and squashes the following commits: e66397b [AhyoungRyu] Apply TOC to interpreterinstallation.md 009579b [AhyoungRyu] Add more info to 'What is the next?' in install.md 04cf501 [AhyoungRyu] Revert 'where to start' section b7cbe5f [AhyoungRyu] Fix typo cf0911c [AhyoungRyu] Rename license file 388f35a [AhyoungRyu] Add jekyll-table-of-contents license info 6394c70 [AhyoungRyu] Fix image path in python.md d00e4b1 [AhyoungRyu] Move interpreter/screenshot/ -> asset/../img/docs-img/ 3ffb383 [AhyoungRyu] Remove duplicated info in r.md & apply toc a03ca99 [AhyoungRyu] Exclude toc.js from pom.xml 3fae7df [AhyoungRyu] Apply auto generated toc to install.md d114a9d [AhyoungRyu] Address @felixcheung feedback 6a788fe [AhyoungRyu] Resize TOC tab indent 6760c00 [AhyoungRyu] Apply auto TOC to all of docs under docs/storage/ fbde57f [AhyoungRyu] Apply auto TOC to all of docs under docs/quickstart/ db76eb6 [AhyoungRyu] Apply auto TOC to all of docs under docs/install/ f35db47 [AhyoungRyu] Apply auto TOC to all of docs under docs/displaysystem/ b05365f [AhyoungRyu] Apply auto TOC to all of docs under docs/rest-api/ 163691c [AhyoungRyu] Apply auto TOC to all of docs under docs/manual/ bef398e [AhyoungRyu] Apply auto TOC to all of docs under docs/development/ 9c5f76b [AhyoungRyu] Apply auto TOC to all of docs under docs/interpreter/ 587d4ba [AhyoungRyu] Apply auto TOC to all of docs under docs/security/ 1f10b97 [AhyoungRyu] Change toc configuration 78dca9e [AhyoungRyu] Add toc.js for auto generating TOC
2016-06-25 19:44:53 +00:00
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/pythonMatplotlib.png" />
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
ZEPPELIN-1048: Pandas support for python interpreter ### What is this PR for? Display Pandas DataFrame using Zeppelin's Table Display system. ### What type of PR is it? Feature ### Todos * [x] fix NPE in logs on empty paragraph execution * [x] matplotlib: refactor `zeppelin_show(plt)` -> `z.show(plt)` * [x] pandas: support `z.show(df)` * [x] update docs ### What is the Jira issue? [ZEPPELIN-1048](https://issues.apache.org/jira/browse/ZEPPELIN-1048) ### How should this be tested? "Zeppelin Tutorial: Python - matplotlib basic" should work, and ```python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") z.show(rates) ``` ### Screenshots (if appropriate) ![screen shot 2016-06-23 at 10 29 00](https://cloud.githubusercontent.com/assets/5582506/16289133/85f0ddbc-392d-11e6-86a3-28d10e73f68d.png) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1067 from bzz/python/pandas-support and squashes the following commits: 3b1ad36 [Alexander Bezzubov] Python: update docs to reffer new API ee6668b [Alexander Bezzubov] Python: update docs, add Pandas integration 71be418 [Alexander Bezzubov] Python: limit 1000 for table display system on DataFrame 52e787d [Alexander Bezzubov] Python: pandas DataFrame using Table display system bc91b86 [Alexander Bezzubov] Python: skip interpreting empty paragraphs a7248cd [Alexander Bezzubov] Python: draft of pandas support 15646a1 [Alexander Bezzubov] Python: refactoring to z.show()
2016-06-23 01:25:49 +00:00
## Pandas integration
Apache Zeppelin [Table Display System](../displaysystem/basicdisplaysystem.html#table) provides built-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
ZEPPELIN-1048: Pandas support for python interpreter ### What is this PR for? Display Pandas DataFrame using Zeppelin's Table Display system. ### What type of PR is it? Feature ### Todos * [x] fix NPE in logs on empty paragraph execution * [x] matplotlib: refactor `zeppelin_show(plt)` -> `z.show(plt)` * [x] pandas: support `z.show(df)` * [x] update docs ### What is the Jira issue? [ZEPPELIN-1048](https://issues.apache.org/jira/browse/ZEPPELIN-1048) ### How should this be tested? "Zeppelin Tutorial: Python - matplotlib basic" should work, and ```python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") z.show(rates) ``` ### Screenshots (if appropriate) ![screen shot 2016-06-23 at 10 29 00](https://cloud.githubusercontent.com/assets/5582506/16289133/85f0ddbc-392d-11e6-86a3-28d10e73f68d.png) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1067 from bzz/python/pandas-support and squashes the following commits: 3b1ad36 [Alexander Bezzubov] Python: update docs to reffer new API ee6668b [Alexander Bezzubov] Python: update docs, add Pandas integration 71be418 [Alexander Bezzubov] Python: limit 1000 for table display system on DataFrame 52e787d [Alexander Bezzubov] Python: pandas DataFrame using Table display system bc91b86 [Alexander Bezzubov] Python: skip interpreting empty paragraphs a7248cd [Alexander Bezzubov] Python: draft of pandas support 15646a1 [Alexander Bezzubov] Python: refactoring to z.show()
2016-06-23 01:25:49 +00:00
Example:
```python
import pandas as pd
rates = pd.read_csv("bank.csv", sep=";")
z.show(rates)
```
ZEPPELIN-1115: Python - interpreter for SQL over DataFrame ### What is this PR for? Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support ### What type of PR is it? Improvement ### TODOs * [x] add new interpreter `%python.sql` * [x] add test * [x] make Python-dependant tests, excluded from CI * PythonInterpreterWithPythonInstalledTest * PythonPandasSqlInterpreterTest * run manually by `mvn -Dpython.test.exclude='' test -pl python -am` * [x] add docs `%python.sql` * [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed * [x] after #747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles ### What is the Jira issue? [ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115) ### How should this be tested? `mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run - Given the DataFrame i.e ``` %python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") ``` - SQL query it like ``` %python.sql SELECT * FROM rates LIMIT 10 ``` ### Screenshots (if appropriate) ![screen shot 2016-07-11 at 23 56 04](https://cloud.githubusercontent.com/assets/5582506/16735171/1ebb9354-47c3-11e6-9354-6364e9374a20.png) ### Questions: * Does the licenses files need update? No, no dependencies were included in source or binary release * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits: 0f2f852 [Alexander Bezzubov] Fail SQL gracefully if no python dependencies installed aca2bdf [Alexander Bezzubov] Fix typos in docs :zap: 158ba6a [Alexander Bezzubov] Remove third-party dependant test from CI 5fe46fc [Alexander Bezzubov] Update Python Matplotlib notebook example 72884c8 [Alexander Bezzubov] Add docs for %python.sql feature e931dc4 [Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable 76bbb44 [Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreter f6ca1eb [Alexander Bezzubov] Add %python.sql to interpreter menue 11ba490 [Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
2016-07-14 05:15:42 +00:00
## SQL over Pandas DataFrames
There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and visualization of results though built-in [Table Display System](../displaysystem/basicdisplaysystem.html#table).
ZEPPELIN-1115: Python - interpreter for SQL over DataFrame ### What is this PR for? Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support ### What type of PR is it? Improvement ### TODOs * [x] add new interpreter `%python.sql` * [x] add test * [x] make Python-dependant tests, excluded from CI * PythonInterpreterWithPythonInstalledTest * PythonPandasSqlInterpreterTest * run manually by `mvn -Dpython.test.exclude='' test -pl python -am` * [x] add docs `%python.sql` * [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed * [x] after #747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles ### What is the Jira issue? [ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115) ### How should this be tested? `mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run - Given the DataFrame i.e ``` %python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") ``` - SQL query it like ``` %python.sql SELECT * FROM rates LIMIT 10 ``` ### Screenshots (if appropriate) ![screen shot 2016-07-11 at 23 56 04](https://cloud.githubusercontent.com/assets/5582506/16735171/1ebb9354-47c3-11e6-9354-6364e9374a20.png) ### Questions: * Does the licenses files need update? No, no dependencies were included in source or binary release * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits: 0f2f852 [Alexander Bezzubov] Fail SQL gracefully if no python dependencies installed aca2bdf [Alexander Bezzubov] Fix typos in docs :zap: 158ba6a [Alexander Bezzubov] Remove third-party dependant test from CI 5fe46fc [Alexander Bezzubov] Update Python Matplotlib notebook example 72884c8 [Alexander Bezzubov] Add docs for %python.sql feature e931dc4 [Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable 76bbb44 [Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreter f6ca1eb [Alexander Bezzubov] Add %python.sql to interpreter menue 11ba490 [Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
2016-07-14 05:15:42 +00:00
**Pre-requests**
- Pandas `pip install pandas`
- PandaSQL `pip install -U pandasql`
In case default binded interpreter is Python (first in the interpreter list, under the _Gear Icon_), you can just use it as `%sql` i.e
- first paragraph
```python
import pandas as pd
rates = pd.read_csv("bank.csv", sep=";")
```
- next paragraph
```sql
%sql
SELECT * FROM rates WHERE age < 40
```
Otherwise it can be referred to as `%python.sql`
## Technical description
[ZEPPELIN-502] Python interpreter group ### What is this PR for? Adding a python 2 &3 interpreter. It's a basic implementation (no py4j for example), with a java ProcessBuilder object used to instantiate a python REPL. The interpreter doesn't bring it own python binary but uses the python specified by python.path configutation. Thus, you can still use your specific installed python modules (scikit-learn, matplotlib...) and the interpreter is able to work with python 2 & 3 without change. I had a python helper function (zeppelin_show() ) to easily display matplotlib graph as SVG. ### What type of PR is it? [Feature] ### Todos * [x] - Code review * [x] - Improve bootstrap.py : choose available helper functions and their names * [x] - Unit / IT tests ? * [x] documentation updates needed, that AhyoungRyu pointed out * [X] LICENSE needs to be updated to include all non-apache licensed dependencies (i.e AFAIK Py4j is BSD ) in bin-license * [x] double-check that code formatting conforms project style guide * [x] the branch need to be rebased on latest master. ### What is the Jira issue? [ZEPPELIN-502](https://issues.apache.org/jira/browse/ZEPPELIN-502?jql=project%20%3D%20ZEPPELIN%20AND%20text%20~%20%22python%22) ### How should this be tested? 1. In interpreter screen, in Python section, specify in python.path the python binary you want to use 2. In a paragraph, you can use the interpreter with **_%python_**. Calling help() will describe you the interpreter functionnalities. 3. Install py4j (pip install py4j) if you want to use input form ### Screenshots ![image](https://cloud.githubusercontent.com/assets/12515751/14936724/5108fb60-0ef4-11e6-93ea-232a037f7957.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943716/98a75c4a-0fe0-11e6-9d4b-e10c39d53a15.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14936715/0eec90de-0ef4-11e6-811b-7ebe46f0d279.png) ![image](https://cloud.githubusercontent.com/assets/12515751/14943722/b89b7824-0fe0-11e6-9c73-c12f7372d487.png) ### Questions: * Does the licenses files need update? Yes, only bin-license (py4j) * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Hervé RIVIERE <hriviere@users.noreply.github.com> Closes #869 from hriviere/PR_interpreter_python and squashes the following commits: 80b6e75 [Hervé RIVIERE] [ZEPPELIN-502] move BSD py4j license to zeppelin-distribution/src/bin_license/license a4b82a5 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 3252353 [Hervé RIVIERE] [ZEPPELIN-502] Formatting code to respect project convention 54ec4f1 [Hervé RIVIERE] [ZEPPELIN-502]Improving doc following @AhyoungRyu review 6a831bc [Hervé RIVIERE] [ZEPPELIN-502] Add BSD py4j license 11e1b9c [Hervé RIVIERE] [ZEPPELIN-502] minor changes in python.md e5d0bdb [Hervé RIVIERE] [ZEPPELIN-502] change PYTHON_PATH to ZEPPELIN_PYTHON c62ac98 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md 5008125 [Hervé RIVIERE] [ZEPPELIN-502] Improve python.md with features not yet supported and technical description 7d533e1 [Hervé RIVIERE] [ZEPPELIN-502] Add tests and reformating code to help tests writing fecaf25 [Hervé RIVIERE] [ZEPPELIN-502] Rename python.path to python and default from /usr/bin/python to python 02d1320 [Hervé RIVIERE] [ZEPPELIN-502] Input form, change from simple input form to native (pyspark syntax) 60d2956 [Hervé RIVIERE] [ZEPPELIN-502] Indent as pep8 convention 9bdb192 [Hervé RIVIERE] [ZEPPELIN-502] Add python.md to _navigation.html 7142aa5 [Hervé RIVIERE] [ZEPPELIN-502] Catch exception in logger.error 1a86ad7 [Hervé RIVIERE] [ZEPPELIN-502] Python interpreter group
2016-05-30 20:07:26 +00:00
ZEPPELIN-1115: Python - interpreter for SQL over DataFrame ### What is this PR for? Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support ### What type of PR is it? Improvement ### TODOs * [x] add new interpreter `%python.sql` * [x] add test * [x] make Python-dependant tests, excluded from CI * PythonInterpreterWithPythonInstalledTest * PythonPandasSqlInterpreterTest * run manually by `mvn -Dpython.test.exclude='' test -pl python -am` * [x] add docs `%python.sql` * [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed * [x] after #747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles ### What is the Jira issue? [ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115) ### How should this be tested? `mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run - Given the DataFrame i.e ``` %python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") ``` - SQL query it like ``` %python.sql SELECT * FROM rates LIMIT 10 ``` ### Screenshots (if appropriate) ![screen shot 2016-07-11 at 23 56 04](https://cloud.githubusercontent.com/assets/5582506/16735171/1ebb9354-47c3-11e6-9354-6364e9374a20.png) ### Questions: * Does the licenses files need update? No, no dependencies were included in source or binary release * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits: 0f2f852 [Alexander Bezzubov] Fail SQL gracefully if no python dependencies installed aca2bdf [Alexander Bezzubov] Fix typos in docs :zap: 158ba6a [Alexander Bezzubov] Remove third-party dependant test from CI 5fe46fc [Alexander Bezzubov] Update Python Matplotlib notebook example 72884c8 [Alexander Bezzubov] Add docs for %python.sql feature e931dc4 [Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable 76bbb44 [Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreter f6ca1eb [Alexander Bezzubov] Add %python.sql to interpreter menue 11ba490 [Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
2016-07-14 05:15:42 +00:00
For in-depth technical details on current implementation please refer to [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).
### Some features not yet implemented in the Python Interpreter
* Interrupt a paragraph execution (`cancel()` method) is currently only supported in Linux and MacOs. If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. A JIRA ticket ([ZEPPELIN-893](https://issues.apache.org/jira/browse/ZEPPELIN-893)) is opened to implement this feature in a next release of the interpreter.
* Progression bar in webUI (`getProgress()` method) is currently not implemented.
* Code-completion is currently not implemented.