mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
Add docs for %python.sql feature
This commit is contained in:
parent
e931dc4308
commit
72884c8cb1
4 changed files with 58 additions and 4 deletions
|
|
@ -46,7 +46,7 @@ To access the help, type **help()**
|
|||
## Python modules
|
||||
The interpreter can use all modules already installed (with pip, easy_install...)
|
||||
|
||||
## Use Zeppelin Dynamic Forms
|
||||
## Using Zeppelin Dynamic Forms
|
||||
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your Python code.
|
||||
|
||||
**Zeppelin Dynamic Form can only be used if py4j Python library is installed in your system. If not, you can install it with `pip install py4j`.**
|
||||
|
|
@ -65,6 +65,7 @@ print (z.select("f1",[("o1","1"),("o2","2")],"2"))
|
|||
print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"])))
|
||||
```
|
||||
|
||||
|
||||
## Zeppelin features not fully supported by the Python Interpreter
|
||||
|
||||
* Interrupt a paragraph execution (`cancel()` method) is currently only supported in Linux and MacOs. If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. A JIRA ticket ([ZEPPELIN-893](https://issues.apache.org/jira/browse/ZEPPELIN-893)) is opened to implement this feature in a next release of the interpreter.
|
||||
|
|
@ -94,7 +95,7 @@ z.show(plt, height='150px')
|
|||
|
||||
|
||||
## Pandas integration
|
||||
[Zeppelin Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides simple API to visualize data in Pandas DataFrames, same as in Matplotlib.
|
||||
Apace Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides build-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
|
||||
|
||||
Example:
|
||||
|
||||
|
|
@ -104,6 +105,34 @@ rates = pd.read_csv("bank.csv", sep=";")
|
|||
z.show(rates)
|
||||
```
|
||||
|
||||
## SQL over DataFrames
|
||||
|
||||
There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query Pandas DataFrames and visualization of results though build-in [Table Dispaly System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table).
|
||||
|
||||
**Pre-requests**
|
||||
|
||||
- Pandas `pip install pandas`
|
||||
- PandaSQL `pip install -U pandasql`
|
||||
|
||||
In case default binded interpreter is Python (first in the interpreter list, under the _Gear Icon_), you can just use it as `%sql` i.e
|
||||
|
||||
- first paragraph
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
rates = pd.read_csv("bank.csv", sep=";")
|
||||
```
|
||||
|
||||
- next paragraph
|
||||
|
||||
```sql
|
||||
%sql
|
||||
SELECT * FROM rates WHERE age < 40
|
||||
```
|
||||
|
||||
Otherwise it can be reffered as `%python.sql`
|
||||
|
||||
|
||||
## Technical description
|
||||
|
||||
For in-depth technical details on current implementation plese reffer [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).
|
||||
|
|
|
|||
|
|
@ -40,3 +40,5 @@ Current interpreter implementation spawns new system python process through `Pro
|
|||
* JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a `kill SIGINT PID` to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py
|
||||
|
||||
* Matplotlib display feature is made with SVG export (in string) and then displays it with html code.
|
||||
|
||||
* `%python.sql` support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed
|
||||
|
|
@ -65,8 +65,8 @@ public class PythonPandasSqlInterpreter extends Interpreter {
|
|||
public void open() {
|
||||
LOG.info("Open Python SQL interpreter instance: {}", this.toString());
|
||||
|
||||
//TODO(bzz): check by importing and catching ImportError
|
||||
//if (pandasAndNumpyAndPandasqlAreInstalled) {
|
||||
//TODO(bzz): check i.e by importing and catching ImportError
|
||||
//if (py4jAndPandasAndPandasqlAreInstalled) {
|
||||
try {
|
||||
LOG.info("Bootstrap {} interpreter with {}", this.toString(), SQL_BOOTSTRAP_FILE_PY);
|
||||
PythonInterpreter python = getPythonInterpreter();
|
||||
|
|
|
|||
|
|
@ -72,6 +72,8 @@ plt.close()
|
|||
print ('''<pre>z.show(plt,width='50px')
|
||||
z.show(plt,height='150px') </pre></div>''')
|
||||
print ('<h3>Pandas DataFrame</h3>')
|
||||
print ('<div> You need to have Pandas module installed ')
|
||||
print ('to use this functionality (pip install pandas) !</div><br/>')
|
||||
print """
|
||||
<div>The interpreter can visualize Pandas DataFrame
|
||||
with the function z.show()
|
||||
|
|
@ -81,6 +83,27 @@ df = pd.read_csv("bank.csv", sep=";")
|
|||
z.show(df)
|
||||
</pre></div>
|
||||
"""
|
||||
print ('<h3>SQL over Pandas DataFrame</h3>')
|
||||
print ('<div> You need to have Pandas&Pandasql modules installed ')
|
||||
print ('to use this functionality (pip install pandas pandasql) !</div><br/>')
|
||||
print """
|
||||
<div>Python interpreter group includes %sql interpreter that can query
|
||||
Pandas DataFrames using SQL and visualize results using Zeppelin Table Display System
|
||||
|
||||
<pre>
|
||||
%python
|
||||
import pandas as pd
|
||||
df = pd.read_csv("bank.csv", sep=";")
|
||||
</pre>
|
||||
<br />
|
||||
|
||||
<pre>
|
||||
%python.sql
|
||||
%sql
|
||||
SELECT * from df LIMIT 5
|
||||
</pre></div>
|
||||
"""
|
||||
|
||||
|
||||
class PyZeppelinContext(object):
|
||||
""" If py4j is detected, these class will be override
|
||||
|
|
|
|||
Loading…
Reference in a new issue