ZEPPELIN-4437. Update python document

2026-05-24 09:38:26 +00:00 · 2019-11-08 17:22:43 +08:00 · 2019-11-08 17:22:43 +08:00 · 48163d0892
commit 48163d0892
parent f2d1d4f87e
5 changed files with 307 additions and 189 deletions
--- a/docs/assets/themes/zeppelin/img/docs-img/ipython_code_completion.png
+++ b/docs/assets/themes/zeppelin/img/docs-img/ipython_code_completion.png
--- a/docs/assets/themes/zeppelin/img/docs-img/ipython_error.png
+++ b/docs/assets/themes/zeppelin/img/docs-img/ipython_error.png
--- a/docs/assets/themes/zeppelin/img/docs-img/ipython_hvplot.png
+++ b/docs/assets/themes/zeppelin/img/docs-img/ipython_hvplot.png
--- a/docs/interpreter/python.md
+++ b/docs/interpreter/python.md
@ -23,6 +23,34 @@ limitations under the License.

 <div id="toc"></div>

+## Overview
+
+Zeppelin supports python language which is very popular in data analytics and machine learning.
+
+<table class="table-configuration">
+  <tr>
+    <th>Name</th>
+    <th>Class</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>%python</td>
+    <td>PythonInterpreter</td>
+    <td>Vanilla python interpreter, with least dependencies, only python environment installed is required</td>
+  </tr>
+  <tr>
+    <td>%python.ipython</td>
+    <td>IPythonInterpreter</td>
+    <td>Provide more fancy python runtime via IPython, almost the same experience like Jupyter. It requires more things, but is the recommended interpreter for using python in Zeppelin, see below</td>
+  </tr>
+  <tr>
+    <td>%python.sql</td>
+    <td>PythonInterpreterPandasSql</td>
+    <td>Provide sql capability to query data in Pandas DataFrame via <code>pandasql</code></td>
+  </tr>
+</table>
+
+
 ## Configuration
 <table class="table-configuration">
  <tr>
@ -33,8 +61,8 @@ limitations under the License.
  <tr>
    <td>zeppelin.python</td>
    <td>python</td>
-    <td>Path of the already installed Python binary (could be python2 or python3).
-    If python is not in your $PATH you can set the absolute directory (example : /usr/bin/python)
+    <td>Path of the installed Python binary (could be python2 or python3).
+    You should set this property explicitly if python is not in your <code>$PATH</code>(example: /usr/bin/python).
    </td>
  </tr>
  <tr>
@ -42,19 +70,282 @@ limitations under the License.
    <td>1000</td>
    <td>Max number of dataframe rows to display.</td>
  </tr>
+  <tr>
+    <td>zeppelin.python.useIPython</td>
+    <td>true</td>
+    <td>When this property is true, <code>%python</code> would be delegated to <code>%python.ipython</code> if IPython is available, otherwise
+    IPython is only used in <code>%python.ipython</code>.
+    </td>
+  </tr>
 </table>

-## Enabling Python Interpreter

-In a notebook, to enable the **Python** interpreter, click on the **Gear** icon and select **Python**
+## Vanilla Python Interpreter (`%python`)

-## Using the Python Interpreter
+The vanilla python interpreter provides basic python interpreter feature, only python installed is required.

-In a paragraph, use **_%python_** to select the **Python** interpreter and then input all commands.
+### Matplotlib integration

-The interpreter can only work if you already have python installed (the interpreter doesn't bring it own python binaries).
+The vanilla python interpreter can display matplotlib figures inline automatically using the `matplotlib`:
+ 
+```python
+%python

-To access the help, type **help()**
+import matplotlib.pyplot as plt
+plt.plot([1, 2, 3])
+```
+
+The output of this command will by default be converted to HTML by implicitly making use of the `%html` magic. Additional configuration can be achieved using the builtin `z.configure_mpl()` method. For example, 
+
+```python
+
+z.configure_mpl(width=400, height=300, fmt='svg')
+plt.plot([1, 2, 3])
+```
+
+Will produce a 400x300 image in SVG format, which by default are normally 600x400 and PNG respectively. 
+In the future, another option called `angular` can be used to make it possible to update a plot produced from one paragraph directly from another 
+(the output will be `%angular` instead of `%html`). However, this feature is already available in the `pyspark` interpreter. 
+More details can be found in the included "Zeppelin Tutorial: Python - matplotlib basic" tutorial notebook. 
+
+If Zeppelin cannot find the matplotlib backend files (which should usually be found in `$ZEPPELIN_HOME/interpreter/lib/python`) in your `PYTHONPATH`, 
+then the backend will automatically be set to agg, and the (otherwise deprecated) instructions below can be used for more limited inline plotting.
+
+If you are unable to load the inline backend, use `z.show(plt)`:
+
+```python
+%python
+
+import matplotlib.pyplot as plt
+plt.figure()
+(.. ..)
+z.show(plt)
+plt.close()
+```
+The `z.show()` function can take optional parameters to adapt graph dimensions (width and height) as well as output format (png or optionally svg).
+
+ ```python
+%python
+
+z.show(plt, width='50px')
+z.show(plt, height='150px', fmt='svg')
+```
+<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/pythonMatplotlib.png" />
+
+
+
+## IPython Interpreter (`%python.ipython`) (recommended)
+
+IPython is more powerful than the vanilla python interpreter with extra functionality. You can use IPython with Python2 or Python3 which depends on which python you set in `zeppelin.python`.
+
+For non-anaconda environment 
+
+   **Prerequisites**
+   
+    - Jupyter `pip install jupyter`
+    - grpcio `pip install grpcio`
+    - protobuf `pip install protobuf`
+
+For anaconda environment (`zeppelin.python` points to the python under anaconda)
+
+   **Prerequisites**
+   
+    - grpcio `pip install grpcio`
+    - protobuf `pip install protobuf`
+
+In addition to all the basic functions of the vanilla python interpreter, you can use all the IPython advanced features as you use it in Jupyter Notebook.
+
+e.g. 
+
+### Use IPython magic
+
+```
+%python.ipython
+
+#python help
+range?
+
+#timeit
+%timeit range(100)
+```
+
+### Use matplotlib 
+
+```
+%python.ipython
+
+%matplotlib inline
+import matplotlib.pyplot as plt
+
+print("hello world")
+data=[1,2,3,4]
+plt.figure()
+plt.plot(data)
+```
+
+### Colored text output
+
+<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/ipython_error.png" />
+
+### More types of visualization
+e.g. IPython supports hvplot
+<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/ipython_hvplot.png" />
+
+### Better code completion
+<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/ipython_code_completion.png" />
+
+
+By default, Zeppelin would use IPython in `%python` if IPython prerequisites are meet, otherwise it would use vanilla Python interpreter in `%python`.
+If you don't want to use IPython via `%python`, then you can set `zeppelin.python.useIPython` as `false` in interpreter setting.
+
+
+## Pandas integration
+Apache Zeppelin [Table Display System](../usage/display_system/basic.html#table) provides built-in data visualization capabilities. 
+Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
+
+Example:
+
+```python
+%python
+
+import pandas as pd
+rates = pd.read_csv("bank.csv", sep=";")
+z.show(rates)
+```
+
+## SQL over Pandas DataFrames
+
+There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and 
+enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and 
+visualization of results though built-in [Table Display System](../usage/display_system/basic.html#table).
+
+ **Prerequisites**
+
+  - Pandas `pip install pandas`
+  - PandaSQL `pip install -U pandasql`
+
+Here's one example:
+
+ - first paragraph
+
+  ```python
+%python
+
+import pandas as pd
+rates = pd.read_csv("bank.csv", sep=";")
+  ```
+
+ - next paragraph
+
+  ```sql
+%python.sql
+
+SELECT * FROM rates WHERE age < 40
+  ```
+
+
+## Using Zeppelin Dynamic Forms
+You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/usage/dynamic_form/intro.html) inside your Python code.
+
+Example : 
+
+```python
+%python
+
+### Input form
+print(z.input("f1","defaultValue"))
+
+### Select form
+print(z.select("f2",[("o1","1"),("o2","2")],"o1"))
+
+### Checkbox form
+print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["o1"])))
+```
+
+## ZeppelinContext API
+
+Python interpreter create a variable `z` which represent `ZeppelinContext` for you. User can use it to do more fancy and complex things in Zeppelin.
+
+<table class="table-configuration">
+  <tr>
+    <th>API</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>z.put(key, value)</td>
+    <td>Put object <code>value</code> with identifier <code>key</code> to distributed resource pool of Zeppelin, 
+    so that it can be used by other interpreters</td>
+  </tr>
+  <tr>
+    <td>z.get(key)</td>
+    <td>Get object with identifier <code>key</code> from distributed resource pool of Zeppelin</td>
+  </tr>
+  <tr>
+    <td>z.remove(key)</td>
+    <td>Remove object with identifier <code>key</code> from distributed resource pool of Zeppelin</td>
+  </tr>
+  <tr>
+    <td>z.getAsDataFrame(key)</td>
+    <td>Get object with identifier <code>key</code> from distributed resource pool of Zeppelin and converted into pandas dataframe.
+    The object in the distributed resource pool must be table type, e.g. jdbc interpreter result.
+    </td>
+  </tr>
+  <tr>
+    <td>z.angular(name, noteId = None, paragraphId = None)</td>
+    <td>Get the angular object with identifier <code>name</code></td>
+  </tr>
+  <tr>
+    <td>z.angularBind(name, value, noteId = None, paragraphId = None)</td>
+    <td>Bind value to angular object with identifier <code>name</code></td>
+  </tr>
+  <tr>
+    <td>z.angularUnbind(name, noteId = None)</td>
+    <td>Unbind value from angular object with identifier <code>name</code></td>
+  </tr>
+  <tr>
+    <td>z.show(p)</td>
+    <td>Show python object <code>p</code> in Zeppelin, if it is pandas dataframe, it would be displayed in Zeppelin's table format, 
+    others will be converted to string</td>
+  </tr>  
+  <tr>
+    <td>z.textbox(name, defaultValue="")</td>
+    <td>Create dynamic form Textbox <code>name</code> with defaultValue</td>
+  </tr>
+  <tr>
+    <td>z.select(name, options, defaultValue="")</td>
+    <td>Create dynamic form Select <code>name</code> with options and defaultValue. options should be a list of Tuple(first element is key, 
+    the second element is the displayed value) e.g. <code>z.select("f2",[("o1","1"),("o2","2")],"o1")</code></td>
+  </tr>
+  <tr>
+    <td>z.checkbox(name, options, defaultChecked=[])</td>
+    <td>Create dynamic form Checkbox `name` with options and defaultChecked. options should be a list of Tuple(first element is key, 
+    the second element is the displayed value) e.g. <code>z.checkbox("f3", [("o1","1"), ("o2","2")],["o1"])</code></td>
+  </tr>
+  <tr>
+    <td>z.noteTextbox(name, defaultValue="")</td>
+    <td>Create note level dynamic form Textbox</td>
+  </tr>
+  <tr>
+    <td>z.noteSelect(name, options, defaultValue="")</td>
+    <td>Create note level dynamic form Select</td>
+  </tr>
+  <tr>
+    <td>z.noteCheckbox(name, options, defaultChecked=[])</td>
+    <td>Create note level dynamic form Checkbox</td>
+  </tr>
+  <tr>
+    <td>z.run(paragraphId)</td>
+    <td>Run paragraph</td>
+  </tr>
+  <tr>
+    <td>z.run(noteId, paragraphId)</td>
+    <td>Run paragraph</td>
+  </tr>
+  <tr>
+    <td>z.runNote(noteId)</td>
+    <td>Run the whole note</td>
+  </tr>
+</table>

 ## Python environments

@ -68,7 +359,7 @@ The interpreter can use all modules already installed (with pip, easy_install...

 #### Usage

- get the Conda Infomation: 
+- get the Conda Information: 

    ```
    %python.conda info
@ -144,187 +435,14 @@ Here is an example
 %python.docker activate gcr.io/tensorflow/tensorflow:latest
 ```

-## Using Zeppelin Dynamic Forms
-You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/usage/dynamic_form/intro.html) inside your Python code.
-
-**Zeppelin Dynamic Form can only be used if py4j Python library is installed in your system. If not, you can install it with `pip install py4j`.**
-
-Example : 
-
-```python
-%python
-### Input form
-print (z.input("f1","defaultValue"))
-
-### Select form
-print (z.select("f1",[("o1","1"),("o2","2")],"2"))
-
-### Checkbox form
-print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"])))
-```
-
-## Matplotlib integration
-
- The python interpreter can display matplotlib figures inline automatically using the `pyplot` module:
- 
-```python
-%python
-import matplotlib.pyplot as plt
-plt.plot([1, 2, 3])
-```
-This is the recommended method for using matplotlib from within a Zeppelin notebook. The output of this command will by default be converted to HTML by implicitly making use of the `%html` magic. Additional configuration can be achieved using the builtin `z.configure_mpl()` method. For example, 
-
-```python
-z.configure_mpl(width=400, height=300, fmt='svg')
-plt.plot([1, 2, 3])
-```
-
-Will produce a 400x300 image in SVG format, which by default are normally 600x400 and PNG respectively. 
-In the future, another option called `angular` can be used to make it possible to update a plot produced from one paragraph directly from another 
-(the output will be `%angular` instead of `%html`). However, this feature is already available in the `pyspark` interpreter. 
-More details can be found in the included "Zeppelin Tutorial: Python - matplotlib basic" tutorial notebook. 
-
-If Zeppelin cannot find the matplotlib backend files (which should usually be found in `$ZEPPELIN_HOME/interpreter/lib/python`) in your `PYTHONPATH`, 
-then the backend will automatically be set to agg, and the (otherwise deprecated) instructions below can be used for more limited inline plotting.
-
-If you are unable to load the inline backend, use `z.show(plt)`:
-
-```python
-%python
-import matplotlib.pyplot as plt
-plt.figure()
-(.. ..)
-z.show(plt)
-plt.close()
-```
-The `z.show()` function can take optional parameters to adapt graph dimensions (width and height) as well as output format (png or optionally svg).
-
- ```python
-%python
-z.show(plt, width='50px')
-z.show(plt, height='150px', fmt='svg')
-```
-<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/pythonMatplotlib.png" />
-
-
-## Pandas integration
-Apache Zeppelin [Table Display System](../usage/display_system/basic.html#table) provides built-in data visualization capabilities. 
-Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, 
-same as with [Matplotlib integration](#matplotlib-integration).
-
-Example:
-
-```python
-import pandas as pd
-rates = pd.read_csv("bank.csv", sep=";")
-z.show(rates)
-```
-
-## SQL over Pandas DataFrames
-
-There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and 
-enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and 
-visualization of results though built-in [Table Display System](../usage/display_system/basic.html#table).
-
- **Pre-requests**
-
-  - Pandas `pip install pandas`
-  - PandaSQL `pip install -U pandasql`
-
-In case default binded interpreter is Python (first in the interpreter list, under the _Gear Icon_), you can just use it as `%sql` i.e
-
- - first paragraph
-
-  ```python
-import pandas as pd
-rates = pd.read_csv("bank.csv", sep=";")
-  ```
-
- - next paragraph
-
-  ```sql
-%sql
-SELECT * FROM rates WHERE age < 40
-  ```
-
-Otherwise it can be referred to as `%python.sql`
-
-
-## IPython Support
-
-IPython is more powerful than the default python interpreter with extra functionality. You can use IPython with Python2 or Python3 which depends on which python you set `zeppelin.python`.
-
-   **Pre-requests**
-   
-    - Jupyter `pip install jupyter`
-    - grpcio `pip install grpcio`
-    - protobuf `pip install protobuf`
-
-If you already install anaconda, then you just need to install `grpcio` as Jupyter is already included in anaconda. For grpcio version >= 1.12.0 you'll also need to install protobuf separately.
-
-In addition to all basic functions of the python interpreter, you can use all the IPython advanced features as you use it in Jupyter Notebook.
-
-e.g. 
-
-Use IPython magic
-
-```
-%python.ipython
-
-#python help
-range?
-
-#timeit
-%timeit range(100)
-```
-
-Use matplotlib 
-
-```
-%python.ipython
-
-
-%matplotlib inline
-import matplotlib.pyplot as plt
-
-print("hello world")
-data=[1,2,3,4]
-plt.figure()
-plt.plot(data)
-```
-
-We also make `ZeppelinContext` available in IPython Interpreter. You can use `ZeppelinContext` to create dynamic forms and display pandas DataFrame.
-
-e.g.
-
-Create dynamic form
-
-```
-z.input(name='my_name', defaultValue='hello')
-```
-
-Show pandas dataframe
-
-```
-import pandas as pd
-df = pd.DataFrame({'id':[1,2,3], 'name':['a','b','c']})
-z.show(df)
-
-```
-
-By default, we would use IPython in `%python.python` if IPython is available. Otherwise it would fall back to the original Python implementation.
-If you don't want to use IPython, then you can set `zeppelin.python.useIPython` as `false` in interpreter setting.
-
 ## Technical description

 For in-depth technical details on current implementation please refer to [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).


-### Some features not yet implemented in the Python Interpreter
+## Some features not yet implemented in the vanilla Python interpreter

 * Interrupt a paragraph execution (`cancel()` method) is currently only supported in Linux and MacOs. 
 If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. 
 A JIRA ticket ([ZEPPELIN-893](https://issues.apache.org/jira/browse/ZEPPELIN-893)) is opened to implement this feature in a next release of the interpreter.
 * Progression bar in webUI  (`getProgress()` method) is currently not implemented.
-* Code-completion is currently not implemented.
-
--- a/python/src/main/resources/python/zeppelin_context.py
+++ b/python/src/main/resources/python/zeppelin_context.py
@ -66,12 +66,12 @@ class PyZeppelinContext(object):
            print("fail to call getAsDataFrame as pandas is not installed")
        return pd.read_csv(StringIO(value), sep="\t")

-    def angular(self, key, noteId = None, paragraphId = None):
-        return self.z.angular(key, noteId, paragraphId)
-
    def remove(self, key):
        self.z.remove(key)

+    def angular(self, key, noteId = None, paragraphId = None):
+        return self.z.angular(key, noteId, paragraphId)
+
    def contains(self, key):
        return self.contains(key)

@ -120,11 +120,11 @@ class PyZeppelinContext(object):
    def runAll(self):
        return self.z.runAll()

-    def angular(self, key, noteId = None, paragraphId = None):
+    def angular(self, name, noteId = None, paragraphId = None):
        if noteId == None:
-            return self.z.angular(key, self.z.getInterpreterContext().getNoteId(), paragraphId)
+            return self.z.angular(name, self.z.getInterpreterContext().getNoteId(), paragraphId)
        else:
-            return self.z.angular(key, noteId, paragraphId)
+            return self.z.angular(name, noteId, paragraphId)

    def angularBind(self, name, value, noteId = None, paragraphId = None):
        if noteId == None: