### What is this PR for? This PR is the first of two major steps needed to improve matplotlib integration in Zeppelin (ZEPPELIN-1344). The latter, which is a plotting backend with fully interactive tools enabled, will be done afterwards in a separate PR. This PR specifically for automatically displaying output from calls to matplotlib plotting functions inline with each paragraph. Thanks to the addition of post-execute hooks (ZEPPELIN-1423), there is no need to call any `show()` function to display an inline plot, just like in Jupyter. ### What type of PR is it? Improvement ### Todos The main code has been written and anyone who reads this is encouraged to test it, but there are a few minor todos: - [x] - Add unit tests - [x] - Add documentation - [x] - Add screenshot showing iterative plotting with angular mode ### What is the Jira issue? [ZEPPELIN-1345](https://issues.apache.org/jira/browse/ZEPPELIN-1345) ### How should this be tested? In a pyspark or python paragraph, enter and run ``` python import matplotlib.pyplot as plt plt.plot([1, 2, 3]) ``` The plot should be displayed automatically without calling any `show()` function whatsoever. A special method called `configure_mpl()` can also be used to modify the inline plotting behavior. For example, ``` python z.configure_mpl(close=False, angular=True) plt.plot([1, 2, 3]) ``` allows for iterative updates to the plot provided you have PY4J installed for your python installation (which of course is always the case if you use pypsark). To clarify, this feature only currently works with pyspark (not python as there are no `angularBind()` and `angularUnbind()` methods yet). Doing something like: ``` plt.plot([3, 2, 1]) ``` will update the plot that was generated by the previous paragraph by leveraging Zeppelin's Angular Display System. However, by setting `close=False`, matplotlib will no longer automatically close figures so it is now up to the user to explicitly close each figure instance they create. There's quite a bit more options for `z.configure_mpl()`, but I will save that discussion for the documentation. ### Screenshots (if appropriate)  ### Questions: - Does the licenses files need update? No - Is there breaking changes for older versions? No - Does this needs documentation? Yes Author: Alex Goodman <agoodm@users.noreply.github.com> Closes #1534 from agoodm/ZEPPELIN-1345 and squashes the following commits:9ef6ff7[Alex Goodman] Move mpl backend files to /interpreter24f89c6[Alex Goodman] Catch potential NullPointerExceptions from hook registrybdb584e[Alex Goodman] Make sure expressions are printed when no plots are shown22b6fe4[Alex Goodman] Remove unused variabled3d1aa0[Alex Goodman] Fix CI test failurec90d204[Alex Goodman] Update spark.mdbcf0bf3[Alex Goodman] Update python.md for new matplotlib integrationc9b65a5[Alex Goodman] Add iterative plotting example image8029a05[Alex Goodman] Update python/README.mdf2d9e86[Alex Goodman] Exclude tests are excluded in python/pom.xml86b1c90[Alex Goodman] Fix tutorial notebook not loadingc37b00f[Alex Goodman] Fix legend in tutorial notebooka321d79[Alex Goodman] Update python.md82350e3[Alex Goodman] Update matplotlib tutorial notebook9792f97[Alex Goodman] Add unit tests8b9b973[Alex Goodman] Fix NullPointerExceptions in unit tests82135ad[Alex Goodman] Removed unused variablef9c9498[Alex Goodman] Added support for Angular Display Systemedf750a[Alex Goodman] Add new matplotlib backend for python/pyspark interpreters
3 KiB
Overview
Python interpreter for Apache Zeppelin
Architecture
Current interpreter implementation spawns new system python process through ProcessBuilder and re-directs it's stdin\strout to Zeppelin
Details
- UnitTests
To run full suit of tests, including ones that depend on real Python interpreter AND external libraries installed (like Pandas, Pandasql, etc) do
mvn -Dpython.test.exclude='' test -pl python -am
- Py4j support
Py4j enables Python programs to dynamically access Java objects in a JVM. It is required in order to use Zeppelin dynamic forms feature.
- bootstrap process
Interpreter environment is setup with thex bootstrap.py
It defines help() and z convenience functions
Dev prerequisites
-
Python 2 or 3 installed with py4j (0.9.2) and matplotlib (1.31 or later) installed on each
-
Tests only checks the interpreter logic and starts any Python process! Python process is mocked with a class that simply output it input.
-
Code wrote in
bootstrap.pyandbootstrap_input.pyshould always be Python 2 and 3 compliant. -
Use PEP8 convention for python code.
Technical overview
-
When interpreter is starting it launches a python process inside a Java ProcessBuilder. Python is started with -i (interactive mode) and -u (unbuffered stdin, stdout and stderr) options. Thus the interpreter has a "sleeping" python process.
-
Interpreter sends command to python with a Java
outputStreamWiterand read from anInputStreamReader. To know when stop reading stdout, interpreter sendsprint "*!?flush reader!?*"after each command and reads stdout until he receives back the*!?flush reader!?*. -
When interpreter is starting, it sends some Python code (bootstrap.py and bootstrap_input.py) to initialize default behavior and functions (
help(), z.input()...). bootstrap_input.py is sent only if py4j library is detected inside Python process. -
Py4J python and java libraries is used to load Input zeppelin Java class into the python process (make java code with python code !). Therefore the interpreter can directly create Zeppelin input form inside the Python process (and eventually with some python variable already defined). JVM opens a random open port to be accessible from python process.
-
JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a
kill SIGINT PIDto python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py -
Matplotlib figures are displayed inline with the notebook automatically using a built-in backend for zeppelin in conjunction with a post-execute hook.
-
%python.sqlsupport for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed