zeppelin/python
Alex Goodman 8b40268d16 ZEPPELIN-1318 - Add support for matplotlib displaying png images in python interpreter
### What is this PR for?
This PR adds support for plotting png images using the matplotlib helper function within a python interpreter (eg `z.show()`). The primary motivation for this is due to the overhead incurred from svg images, which can lag the notebooks if multiple, complicated images are generated (for example, multiple filled contour plots). png images are more lightweight, but of course come at a cost of image quality due to them being raster rather than vector like svg. The support for png images is incorporated through the use of a new optional argument to `z.show` called `fmt` which can be one of `'svg'` or `'png'`. The same code that is currently used in show is used for svg images while the code for png images relies on converting the image directly to a byte array and then entering the decoded byte string directly into an HTML image tag. Currently `fmt` defaults to `'png'` but I think we should consider discussing the pros and cons of each option in this PR.

### What type of PR is it?
Improvement

### What is the Jira issue?
[ZEPPELIN-1318](https://issues.apache.org/jira/browse/ZEPPELIN-1318)

### How should this be tested?
In a notebook cell, enter:
```python
%python
import matplotlib.pyplot as plt
import numpy as np

plt.figure()
plt.plot(np.arange(10))
z.show(plt, fmt=fmt)
```
Where `fmt` may be one of `'svg'` or `'png'`, and any other input should result in a `ValueError`. I would also recommend testing the example in the screenshot below.

### Screenshots (if appropriate)
![zeppelin plot](https://puu.sh/qzc4t/b8fcfe856e.png)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes (if the changes to the `help()` docstring are not sufficient)

Author: Alex Goodman <agoodm@users.noreply.github.com>

Closes #1329 from agoodm/ZEPPELIN-1318 and squashes the following commits:

2e9ce4c [Alex Goodman] Update python.md
1efa0c9 [Alex Goodman] ZEPPELIN-1318 - Add support for png images in z.show()
2016-08-14 11:28:38 +09:00
..
src ZEPPELIN-1318 - Add support for matplotlib displaying png images in python interpreter 2016-08-14 11:28:38 +09:00
pom.xml [MINOR] Change url in pom.xml files 2016-07-31 16:41:56 +09:00
README.md ZEPPELIN-1115: Python - interpreter for SQL over DataFrame 2016-07-15 18:37:18 +09:00

Overview

Python interpreter for Apache Zeppelin

Architecture

Current interpreter implementation spawns new system python process through ProcessBuilder and re-directs it's stdin\strout to Zeppelin

Details

  • UnitTests

To run full suit of tests, including ones that depend on real Python interpreter AND external libraries installed (like Pandas, Pandasql, etc) do

mvn -Dpython.test.exclude='' test -pl python -am
  • Py4j support

Py4j enables Python programs to dynamically access Java objects in a JVM. It is required in order to use Zeppelin dynamic forms feature.

  • bootstrap process

Interpreter environment is setup with thex bootstrap.py It defines help() and z convenience functions

Dev prerequisites

  • Python 2 or 3 installed with py4j (0.9.2) and matplotlib (1.31 or later) installed on each

  • Tests only checks the interpreter logic and starts any Python process! Python process is mocked with a class that simply output it input.

  • Code wrote in bootstrap.py and bootstrap_input.py should always be Python 2 and 3 compliant.

  • Use PEP8 convention for python code.

Technical overview

  • When interpreter is starting it launches a python process inside a Java ProcessBuilder. Python is started with -i (interactive mode) and -u (unbuffered stdin, stdout and stderr) options. Thus the interpreter has a "sleeping" python process.

  • Interpreter sends command to python with a Java outputStreamWiter and read from an InputStreamReader. To know when stop reading stdout, interpreter sends print "*!?flush reader!?*"after each command and reads stdout until he receives back the *!?flush reader!?*.

  • When interpreter is starting, it sends some Python code (bootstrap.py and bootstrap_input.py) to initialize default behavior and functions (help(), z.input()...). bootstrap_input.py is sent only if py4j library is detected inside Python process.

  • Py4J python and java libraries is used to load Input zeppelin Java class into the python process (make java code with python code !). Therefore the interpreter can directly create Zeppelin input form inside the Python process (and eventually with some python variable already defined). JVM opens a random open port to be accessible from python process.

  • JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a kill SIGINT PID to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py

  • Matplotlib display feature is made with SVG export (in string) and then displays it with html code.

  • %python.sql support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed