### What is this PR for?
Python Interpreter fails on Windows because of classpath issue
### What type of PR is it?
Bug Fix
### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-4564#
THIS SOFTWARE IS CONTRIBUTED SUBJECT TO THE TERMS OF THE APACHE SOFTWARE FOUNDATION SOFTWARE GRANT AND CORPORATE CONTRIBUTOR LICENSE AGREEMENT VERSION R190612.
THIS SOFTWARE IS LICENSED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OF NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THIS SOFTWARE MAY BE REDISTRIBUTED TO OTHERS ONLY BY EFFECTIVELY USING THIS OR ANOTHER EQUIVALENT DISCLAIMER IN ADDITION TO ANY OTHER REQUIRED LICENSE TERMS.
Author: Muhammad Taufiq <Muhammad.Taufiq@morganstanley.com>
Closes #3604 from Muhammad-ms/zeppelin9_file_separator and squashes the following commits:
|
||
|---|---|---|
| .. | ||
| src | ||
| pom.xml | ||
| README.md | ||
Overview
Python interpreter for Apache Zeppelin
Architecture
Current interpreter implementation spawns new system python process through ProcessBuilder and re-directs it's stdin\strout to Zeppelin
Details
- UnitTests
To run full suit of tests, including ones that depend on real Python interpreter AND external libraries installed (like Pandas, Pandasql, etc) do
mvn -Dpython.test.exclude='' test -pl python -am
- Py4j support
Py4j enables Python programs to dynamically access Java objects in a JVM. It is required in order to use Zeppelin dynamic forms feature.
Dev prerequisites
-
Python 2 or 3 installed with py4j (0.9.2) and matplotlib (1.31 or later) installed on each
-
Tests only checks the interpreter logic and starts any Python process! Python process is mocked with a class that simply output its input.
-
Code wrote in
bootstrap.pyandbootstrap_input.pyshould always be Python 2 and 3 compliant. -
Use PEP8 convention for python code.
Technical overview
-
When interpreter is starting it launches a python process inside a Java ProcessBuilder. Python is started with -i (interactive mode) and -u (unbuffered stdin, stdout and stderr) options. Thus the interpreter has a "sleeping" python process.
-
Interpreter sends command to python with a Java
outputStreamWiterand read from anInputStreamReader. To know when stop reading stdout, interpreter sendsprint "*!?flush reader!?*"after each command and reads stdout until he receives back the*!?flush reader!?*. -
When interpreter is starting, it sends some Python code (bootstrap.py and bootstrap_input.py) to initialize default behavior and functions (
help(), z.input()...). bootstrap_input.py is sent only if py4j library is detected inside Python process. -
Py4J Python and Java libraries are used to load input zeppelin Java class into the python process (make java code with python code !). Therefore the interpreter can directly create Zeppelin input form inside the Python process (and eventually with some python variable already defined). JVM opens a random open port to be accessible from python process.
-
JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter will directly send a
kill SIGINT PIDto python process to interrupt execution. Python process catches SIGINT signal with some code defined in bootstrap.py -
Matplotlib figures are displayed inline with the notebook automatically using a built-in backend for zeppelin in conjunction with a post-execute hook.
-
%python.sqlsupport for Pandas DataFrames is optional but can be downloaded from here if user does not have one installed.
IPython Overview
IPython interpreter for Apache Zeppelin
IPython Requirements
You need to install the following python packages to make the IPython interpreter work.
- jupyter 5.x
- IPython
- ipykernel
- grpcio
If you have installed anaconda, then you just need to install grpc.
IPython Architecture
Current interpreter delegate the whole work to ipython kernel via jupyter_client. Zeppelin would launch a python process which host the ipython kernel.
Zeppelin interpreter process will communicate with the python process via grpc. Ideally every feature works in IPython should work in Zeppelin as well.