zeppelin/python
astroshim 287ffd50e2 Rewrite PythonInterpreter.
### What is this PR for?
I've been testing the python interpreter and I found at least 4 major issues in the current python interpreter.

1. not working streaming output.
 - https://issues.apache.org/jira/browse/ZEPPELIN-2225

2. printed "..." when there is indent in the python code.
 - https://issues.apache.org/jira/browse/ZEPPELIN-1929

3. very slow output of matplotlib
 - https://issues.apache.org/jira/browse/ZEPPELIN-1894
 - https://issues.apache.org/jira/browse/ZEPPELIN-1360

4. Unexpected output of matplotlib.
 - https://issues.apache.org/jira/browse/ZEPPELIN-2107

so I changed python interpreter to use py4j  based on pyspark interpreter and would be fixed above issues.
and I am going to recreate conda, docker for python interpreter ASAP.

### What type of PR is it?
Bug Fix | Hot Fix | Refactoring

### How should this be tested?
1. not working streaming output.
```
import time
for x in range(0, 5):
    print x
    time.sleep(1)
```

2. printed "..." when there is indent in the python code.
```
def fn():
    print("hi")

fn()
```

3. very slow output of matplotlib.
```
import matplotlib
import sys
import matplotlib.pyplot as plt
plt.plot([1,2,3])
```

4. Unexpected output of matplotlib.
```

import matplotlib.pyplot as plt
import matplotlib as mpl

# Make a figure and axes with dimensions as desired.
fig = plt.figure(figsize=(8, 3))
ax1 = fig.add_axes([0.05, 0.80, 0.9, 0.15])
ax2 = fig.add_axes([0.05, 0.475, 0.9, 0.15])
ax3 = fig.add_axes([0.05, 0.15, 0.9, 0.15])

# Set the colormap and norm to correspond to the data for which
# the colorbar will be used.
cmap = mpl.cm.cool
norm = mpl.colors.Normalize(vmin=5, vmax=10)

# ColorbarBase derives from ScalarMappable and puts a colorbar
# in a specified axes, so it has everything needed for a
# standalone colorbar.  There are many more kwargs, but the
# following gives a basic continuous colorbar with ticks
# and labels.
cb1 = mpl.colorbar.ColorbarBase(ax1, cmap=cmap,
                                norm=norm,
                                orientation='horizontal')
cb1.set_label('Some Units')

# The second example illustrates the use of a ListedColormap, a
# BoundaryNorm, and extended ends to show the "over" and "under"
# value colors.
cmap = mpl.colors.ListedColormap(['r', 'g', 'b', 'c'])
cmap.set_over('0.25')
cmap.set_under('0.75')

# If a ListedColormap is used, the length of the bounds array must be
# one greater than the length of the color list.  The bounds must be
# monotonically increasing.
bounds = [1, 2, 4, 7, 8]
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
cb2 = mpl.colorbar.ColorbarBase(ax2, cmap=cmap,
                                norm=norm,
                                # to use 'extend', you must
                                # specify two extra boundaries:
                                boundaries=[0] + bounds + [13],
                                extend='both',
                                ticks=bounds,  # optional
                                spacing='proportional',
                                orientation='horizontal')
cb2.set_label('Discrete intervals, some other units')

# The third example illustrates the use of custom length colorbar
# extensions, used on a colorbar with discrete intervals.
cmap = mpl.colors.ListedColormap([[0., .4, 1.], [0., .8, 1.],
                                  [1., .8, 0.], [1., .4, 0.]])
cmap.set_over((1., 0., 0.))
cmap.set_under((0., 0., 1.))

bounds = [-1., -.5, 0., .5, 1.]
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
cb3 = mpl.colorbar.ColorbarBase(ax3, cmap=cmap,
                                norm=norm,
                                boundaries=[-10] + bounds + [10],
                                extend='both',
                                # Make the length of each extension
                                # the same as the length of the
                                # interior colors:
                                extendfrac='auto',
                                ticks=bounds,
                                spacing='uniform',
                                orientation='horizontal')
cb3.set_label('Custom extension lengths, some other units')

plt.show()
```

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? no

Author: astroshim <hsshim@zepl.com>
Author: Lee moon soo <moon@apache.org>
Author: HyungSung <hsshim@nflabs.com>

Closes #2106 from astroshim/py4jPythonInterpreter and squashes the following commits:

c9b195b [HyungSung] Merge pull request #16 from Leemoonsoo/py4jdocker
e511ebe [Lee moon soo] add PythonDockerInterpreter to interpreter-setting.json
a76b0d8 [Lee moon soo] fix test on python3
2eb5de7 [Lee moon soo] Fix PythonDockerInterpreterTest.java test
9fcf144 [Lee moon soo] Make python docker interpreter work using py4j
8a016c9 [astroshim] Merge branch 'master' into py4jPythonInterpreter
aad7ee8 [astroshim] fix testcase
ac92cdb [astroshim] fix python interpreter testcase
e8570d2 [astroshim] fix ci for pandassql
be5db4d [astroshim] fix pandas sql testcase
f8e19be [astroshim] fix matplotlib testcase
046db88 [astroshim] add testcase
e49ad24 [astroshim] add pandas
60e9820 [astroshim] bug fix about copying library
574bd21 [astroshim] fix interpreter-setting error
a48df58 [astroshim] Merge branch 'master' into py4jPythonInterpreter
3c9585f [astroshim] update interpreter-setting.json
a50179e [astroshim] add conda interpreter
cbbc15c [astroshim] fix py4j path
5ae5120 [astroshim] fix interpreter-setting
f17bff4 [astroshim] fix testcase failure.
af097ac [astroshim] add testcase
c3f5b78 [astroshim] Merge branch 'master' of https://github.com/apache/zeppelin into py4jPythonInterpreter
1395875 [astroshim] removed unnecessary code.
276011e [astroshim] add py4j lib
7304919 [astroshim] initialize python interpreter using py4j
2017-03-18 19:50:42 +09:00
..
src Rewrite PythonInterpreter. 2017-03-18 19:50:42 +09:00
pom.xml Rewrite PythonInterpreter. 2017-03-18 19:50:42 +09:00
README.md ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell 2016-11-08 07:20:21 -08:00

Overview

Python interpreter for Apache Zeppelin

Architecture

Current interpreter implementation spawns new system python process through ProcessBuilder and re-directs it's stdin\strout to Zeppelin

Details

  • UnitTests

To run full suit of tests, including ones that depend on real Python interpreter AND external libraries installed (like Pandas, Pandasql, etc) do

mvn -Dpython.test.exclude='' test -pl python -am
  • Py4j support

Py4j enables Python programs to dynamically access Java objects in a JVM. It is required in order to use Zeppelin dynamic forms feature.

  • bootstrap process

Interpreter environment is setup with thex bootstrap.py It defines help() and z convenience functions

Dev prerequisites

  • Python 2 or 3 installed with py4j (0.9.2) and matplotlib (1.31 or later) installed on each

  • Tests only checks the interpreter logic and starts any Python process! Python process is mocked with a class that simply output it input.

  • Code wrote in bootstrap.py and bootstrap_input.py should always be Python 2 and 3 compliant.

  • Use PEP8 convention for python code.

Technical overview

  • When interpreter is starting it launches a python process inside a Java ProcessBuilder. Python is started with -i (interactive mode) and -u (unbuffered stdin, stdout and stderr) options. Thus the interpreter has a "sleeping" python process.

  • Interpreter sends command to python with a Java outputStreamWiter and read from an InputStreamReader. To know when stop reading stdout, interpreter sends print "*!?flush reader!?*"after each command and reads stdout until he receives back the *!?flush reader!?*.

  • When interpreter is starting, it sends some Python code (bootstrap.py and bootstrap_input.py) to initialize default behavior and functions (help(), z.input()...). bootstrap_input.py is sent only if py4j library is detected inside Python process.

  • Py4J python and java libraries is used to load Input zeppelin Java class into the python process (make java code with python code !). Therefore the interpreter can directly create Zeppelin input form inside the Python process (and eventually with some python variable already defined). JVM opens a random open port to be accessible from python process.

  • JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a kill SIGINT PID to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py

  • Matplotlib figures are displayed inline with the notebook automatically using a built-in backend for zeppelin in conjunction with a post-execute hook.

  • %python.sql support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed