zeppelin/python
Lee moon soo 850fd81a51 [ZEPPELIN-212] Multiple paragraph results
### What is this PR for?
Currently a paragraph can display only a single type of result.
For example
```
print("""
%text textout
%html htmlout
""")
```

will display only last display system detected, "htmlout".

This pr implements multiple results supports, so not only the last, but also all the display systems in a paragraph outputs can be displayed.

To do that, Note.json format need to be changed. from

```
   paragraph : [
      {
          result : {
              code: "",
              type: "",
              msg: ""
          },
          config : {
             graph : {},
             ...
          },
          ...
      },
      ...
   ],
   ...
```

to

```
   paragraph : [
      {
          results : {
             {
                code: "",
                msg: [
                   {
                        type: "",
                        data: ""
                    },
                    {
                        type: "",
                        data: ""
                     },
                     ...
                ]
          },
          config : {
             results : [
                {
                    graph : {},
                },
                {
                    graph : {},
                },
                ...
             ]
          },
          ...
      },
      ...
   ],
   ...
```

### What type of PR is it?
Improvement

### Todos
* [x] - Make InterpreterResult and InterpreterOutput support multiple display system
* [x] - Automatic migration old format to new format
* [x] - Render multiple results in front-end
* [x] - Take care Importing old notebook format
* [x] - Make helium framework support multiple results

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-212

### How should this be tested?

run following code in a single spark paragraph.
```
%spark
// default output is text
(1 to 3).foreach{i=>
  println(new java.util.Date())
  Thread.sleep(1000)
}

// print something in html
println(s"""%html <h3><font style="color:blue">Some HTML</font></h3>""")

// create table
println(s"""%table key\tvalue
sun\t100
moon\t200

""")

// display text again
println("""%text""")
(1 to 3).foreach{i=>
  println(new java.util.Date())
  Thread.sleep(1000)
}

// another table
Thread.sleep(1000)
println(s"""%table key\tvalue
apple\t100
banana\t200
""")
```

### Screenshots (if appropriate)
![multipleout](https://cloud.githubusercontent.com/assets/1540981/20465902/23379948-af1d-11e6-85cf-4d70597fb94e.gif)

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? yes
* Does this needs documentation? yes

Author: Lee moon soo <moon@apache.org>

Closes #1658 from Leemoonsoo/ZEPPELIN-212 and squashes the following commits:

0c6a13c [Lee moon soo] Merge remote-tracking branch 'origin/master' into ZEPPELIN-212
519cf23 [Lee moon soo] Update PythonInterpreterPandasSqlTest
e6268ac [Lee moon soo] Update PythonInterpreterMatplotlibTest.java
f5034b8 [Lee moon soo] Merge remote-tracking branch 'origin/master' into ZEPPELIN-212
d5981d5 [Lee moon soo] Merge remote-tracking branch 'origin/master' into ZEPPELIN-212
a1fe729 [Lee moon soo] document note format change
282504e [Lee moon soo] Merge remote-tracking branch 'origin/master' into ZEPPELIN-212
7ab6679 [Lee moon soo] update selenium test
6a897a5 [Lee moon soo] Update rest-api doc
cbbd58a [Lee moon soo] update unittest
e89c9b8 [Lee moon soo] restore tutoral note
3ba37b7 [Lee moon soo] Let old version import note without error
d09e03f [Lee moon soo] enable helium only in the last result
6682908 [Lee moon soo] Remove unnecessary listener method
0f1d28f [Lee moon soo] Merge branch 'master' into ZEPPELIN-212
73b3a81 [Lee moon soo] update selenium test
e69f1a1 [Lee moon soo] update test
f12230f [Lee moon soo] Remove unnecessary newline in test
26e201a [Lee moon soo] Update testcase
b01d70f [Lee moon soo] Make helium app work
0a5b4d1 [Lee moon soo] Update HeliumApplicationFactoryTest
d8ca07f [Lee moon soo] update NotebookTest
aaf9778 [Lee moon soo] fix r test
3e43df4 [Lee moon soo] make zeppelin-web test pass
1d6fd3e [Lee moon soo] fix compile errors
0156ffc [Lee moon soo] take care angular object
804768d [Lee moon soo] Take care output streaming
57d6b2f [Lee moon soo] Render multiple results
95b6037 [Lee moon soo] Multiple results
2016-11-30 17:23:57 -08:00
..
src [ZEPPELIN-212] Multiple paragraph results 2016-11-30 17:23:57 -08:00
pom.xml ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell 2016-11-08 07:20:21 -08:00
README.md ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell 2016-11-08 07:20:21 -08:00

Overview

Python interpreter for Apache Zeppelin

Architecture

Current interpreter implementation spawns new system python process through ProcessBuilder and re-directs it's stdin\strout to Zeppelin

Details

  • UnitTests

To run full suit of tests, including ones that depend on real Python interpreter AND external libraries installed (like Pandas, Pandasql, etc) do

mvn -Dpython.test.exclude='' test -pl python -am
  • Py4j support

Py4j enables Python programs to dynamically access Java objects in a JVM. It is required in order to use Zeppelin dynamic forms feature.

  • bootstrap process

Interpreter environment is setup with thex bootstrap.py It defines help() and z convenience functions

Dev prerequisites

  • Python 2 or 3 installed with py4j (0.9.2) and matplotlib (1.31 or later) installed on each

  • Tests only checks the interpreter logic and starts any Python process! Python process is mocked with a class that simply output it input.

  • Code wrote in bootstrap.py and bootstrap_input.py should always be Python 2 and 3 compliant.

  • Use PEP8 convention for python code.

Technical overview

  • When interpreter is starting it launches a python process inside a Java ProcessBuilder. Python is started with -i (interactive mode) and -u (unbuffered stdin, stdout and stderr) options. Thus the interpreter has a "sleeping" python process.

  • Interpreter sends command to python with a Java outputStreamWiter and read from an InputStreamReader. To know when stop reading stdout, interpreter sends print "*!?flush reader!?*"after each command and reads stdout until he receives back the *!?flush reader!?*.

  • When interpreter is starting, it sends some Python code (bootstrap.py and bootstrap_input.py) to initialize default behavior and functions (help(), z.input()...). bootstrap_input.py is sent only if py4j library is detected inside Python process.

  • Py4J python and java libraries is used to load Input zeppelin Java class into the python process (make java code with python code !). Therefore the interpreter can directly create Zeppelin input form inside the Python process (and eventually with some python variable already defined). JVM opens a random open port to be accessible from python process.

  • JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a kill SIGINT PID to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py

  • Matplotlib figures are displayed inline with the notebook automatically using a built-in backend for zeppelin in conjunction with a post-execute hook.

  • %python.sql support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed