Add docs for %python.sql feature

This commit is contained in:
Alexander Bezzubov 2016-07-11 23:11:19 +09:00
parent e931dc4308
commit 72884c8cb1
4 changed files with 58 additions and 4 deletions

View file

@ -46,7 +46,7 @@ To access the help, type **help()**
## Python modules
The interpreter can use all modules already installed (with pip, easy_install...)
## Use Zeppelin Dynamic Forms
## Using Zeppelin Dynamic Forms
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your Python code.
**Zeppelin Dynamic Form can only be used if py4j Python library is installed in your system. If not, you can install it with `pip install py4j`.**
@ -65,6 +65,7 @@ print (z.select("f1",[("o1","1"),("o2","2")],"2"))
print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"])))
```
## Zeppelin features not fully supported by the Python Interpreter
* Interrupt a paragraph execution (`cancel()` method) is currently only supported in Linux and MacOs. If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. A JIRA ticket ([ZEPPELIN-893](https://issues.apache.org/jira/browse/ZEPPELIN-893)) is opened to implement this feature in a next release of the interpreter.
@ -94,7 +95,7 @@ z.show(plt, height='150px')
## Pandas integration
[Zeppelin Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides simple API to visualize data in Pandas DataFrames, same as in Matplotlib.
Apace Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides build-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
Example:
@ -104,6 +105,34 @@ rates = pd.read_csv("bank.csv", sep=";")
z.show(rates)
```
## SQL over DataFrames
There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query Pandas DataFrames and visualization of results though build-in [Table Dispaly System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table).
**Pre-requests**
- Pandas `pip install pandas`
- PandaSQL `pip install -U pandasql`
In case default binded interpreter is Python (first in the interpreter list, under the _Gear Icon_), you can just use it as `%sql` i.e
- first paragraph
```python
import pandas as pd
rates = pd.read_csv("bank.csv", sep=";")
```
- next paragraph
```sql
%sql
SELECT * FROM rates WHERE age < 40
```
Otherwise it can be reffered as `%python.sql`
## Technical description
For in-depth technical details on current implementation plese reffer [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).

View file

@ -40,3 +40,5 @@ Current interpreter implementation spawns new system python process through `Pro
* JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a `kill SIGINT PID` to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py
* Matplotlib display feature is made with SVG export (in string) and then displays it with html code.
* `%python.sql` support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed

View file

@ -65,8 +65,8 @@ public class PythonPandasSqlInterpreter extends Interpreter {
public void open() {
LOG.info("Open Python SQL interpreter instance: {}", this.toString());
//TODO(bzz): check by importing and catching ImportError
//if (pandasAndNumpyAndPandasqlAreInstalled) {
//TODO(bzz): check i.e by importing and catching ImportError
//if (py4jAndPandasAndPandasqlAreInstalled) {
try {
LOG.info("Bootstrap {} interpreter with {}", this.toString(), SQL_BOOTSTRAP_FILE_PY);
PythonInterpreter python = getPythonInterpreter();

View file

@ -72,6 +72,8 @@ plt.close()
print ('''<pre>z.show(plt,width='50px')
z.show(plt,height='150px') </pre></div>''')
print ('<h3>Pandas DataFrame</h3>')
print ('<div> You need to have Pandas module installed ')
print ('to use this functionality (pip install pandas) !</div><br/>')
print """
<div>The interpreter can visualize Pandas DataFrame
with the function z.show()
@ -81,6 +83,27 @@ df = pd.read_csv("bank.csv", sep=";")
z.show(df)
</pre></div>
"""
print ('<h3>SQL over Pandas DataFrame</h3>')
print ('<div> You need to have Pandas&Pandasql modules installed ')
print ('to use this functionality (pip install pandas pandasql) !</div><br/>')
print """
<div>Python interpreter group includes %sql interpreter that can query
Pandas DataFrames using SQL and visualize results using Zeppelin Table Display System
<pre>
%python
import pandas as pd
df = pd.read_csv("bank.csv", sep=";")
</pre>
<br />
<pre>
%python.sql
%sql
SELECT * from df LIMIT 5
</pre></div>
"""
class PyZeppelinContext(object):
""" If py4j is detected, these class will be override