Add docs for %python.sql feature

2026-05-24 09:38:26 +00:00 · 2016-07-11 23:11:19 +09:00 · 2016-07-11 23:11:19 +09:00 · 72884c8cb1
commit 72884c8cb1
parent e931dc4308
4 changed files with 58 additions and 4 deletions
--- a/docs/interpreter/python.md
+++ b/docs/interpreter/python.md
@ -46,7 +46,7 @@ To access the help, type **help()**
 ## Python modules
 The interpreter can use all modules already installed (with pip, easy_install...)

-## Use Zeppelin Dynamic Forms
+## Using Zeppelin Dynamic Forms
 You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your Python code.

 **Zeppelin Dynamic Form can only be used if py4j Python library is installed in your system. If not, you can install it with `pip install py4j`.**
@ -65,6 +65,7 @@ print (z.select("f1",[("o1","1"),("o2","2")],"2"))
 print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"])))
 ```

+
 ## Zeppelin features not fully supported by the Python Interpreter

 * Interrupt a paragraph execution (`cancel()` method) is currently only supported in Linux and MacOs. If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. A JIRA ticket ([ZEPPELIN-893](https://issues.apache.org/jira/browse/ZEPPELIN-893)) is opened to implement this feature in a next release of the interpreter.
@ -94,7 +95,7 @@ z.show(plt, height='150px')


 ## Pandas integration
-[Zeppelin Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides simple API to visualize data in Pandas DataFrames, same as in Matplotlib.
+Apace Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides build-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).

 Example:

@ -104,6 +105,34 @@ rates = pd.read_csv("bank.csv", sep=";")
 z.show(rates)
 ```

+## SQL over DataFrames
+
+There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query Pandas DataFrames and visualization of results though build-in [Table Dispaly System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table).
+
+ **Pre-requests**
+
+  - Pandas `pip install pandas`
+  - PandaSQL `pip install -U pandasql`
+
+In case default binded interpreter is Python (first in the interpreter list, under the _Gear Icon_), you can just use it as `%sql` i.e
+
+ - first paragraph
+
+  ```python
+import pandas as pd
+rates = pd.read_csv("bank.csv", sep=";")
+  ```
+
+ - next paragraph
+
+  ```sql
+%sql
+SELECT * FROM rates WHERE age < 40
+  ```
+
+Otherwise it can be reffered as `%python.sql`
+
+
 ## Technical description

 For in-depth technical details on current implementation plese reffer [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).
--- a/python/README.md
+++ b/python/README.md
@ -40,3 +40,5 @@ Current interpreter implementation spawns new system python process through `Pro
 * JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly  send a `kill SIGINT PID` to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py

 * Matplotlib display feature is made with SVG export (in string) and then displays it with html code.
+
+ * `%python.sql` support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed
--- a/python/src/main/java/org/apache/zeppelin/python/PythonPandasSqlInterpreter.java
+++ b/python/src/main/java/org/apache/zeppelin/python/PythonPandasSqlInterpreter.java
@ -65,8 +65,8 @@ public class PythonPandasSqlInterpreter extends Interpreter {
  public void open() {
    LOG.info("Open Python SQL interpreter instance: {}", this.toString());

-  //TODO(bzz): check by importing and catching ImportError
-  //if (pandasAndNumpyAndPandasqlAreInstalled) {
+  //TODO(bzz): check i.e by importing and catching ImportError
+  //if (py4jAndPandasAndPandasqlAreInstalled) {
    try {
      LOG.info("Bootstrap {} interpreter with {}", this.toString(), SQL_BOOTSTRAP_FILE_PY);
      PythonInterpreter python = getPythonInterpreter();
--- a/python/src/main/resources/bootstrap.py
+++ b/python/src/main/resources/bootstrap.py
@ -72,6 +72,8 @@ plt.close()
    print ('''<pre>z.show(plt,width='50px')
 z.show(plt,height='150px') </pre></div>''')
    print ('<h3>Pandas DataFrame</h3>')
+    print ('<div> You need to have Pandas module installed ')
+    print ('to use this functionality (pip install pandas) !</div><br/>')
    print """
 <div>The interpreter can visualize Pandas DataFrame
 with the function z.show()
@ -81,6 +83,27 @@ df = pd.read_csv("bank.csv", sep=";")
 z.show(df)
 </pre></div>
 """
+    print ('<h3>SQL over Pandas DataFrame</h3>')
+    print ('<div> You need to have Pandas&Pandasql modules installed ')
+    print ('to use this functionality (pip install pandas pandasql) !</div><br/>')
+    print """
+<div>Python interpreter group includes %sql interpreter that can query
+Pandas DataFrames using SQL and visualize results using Zeppelin Table Display System
+
+<pre>
+%python
+import pandas as pd
+df = pd.read_csv("bank.csv", sep=";")
+</pre>
+<br />
+
+<pre>
+%python.sql
+%sql
+SELECT * from df LIMIT 5
+</pre></div>
+"""
+

 class PyZeppelinContext(object):
    """ If py4j is detected, these class will be override