zeppelin/python
astroshim b7307d49de [ZEPPELIN-1567] Let JDBC interpreter use user credential information.
### What is this PR for?

This PR is for the multi-tenant of JDBC Interpreter.

User can create a user/password for JDBC account at the [Credential page](http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/security/datasource_authorization.html).
The `Entity` of `Credential` is match with JDBC interpreter group name.

If the account for JDBC is not setted in the `Interpreter property` then use `Credential`'s.
### What type of PR is it?

Improvement
### What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1567
### How should this be tested?

Please refer to testMultiTenant() of JDBCInterpreterTest/
### Screenshots (if appropriate)
### Questions:
- Does the licenses files need update? no
- Is there breaking changes for older versions? no
- Does this needs documentation? no

Author: astroshim <hsshim@nflabs.com>

Closes #1539 from astroshim/jdbc-impersonation and squashes the following commits:

46fce31 [astroshim] add explanation of InterpreterGroup
7a92236 [astroshim] fix doc and remove persist value.
63f5ea7 [astroshim] Merge branch 'master' into jdbc-impersonation
267277a [astroshim] rebase
649ff6e [astroshim] rebase
872fb49 [astroshim] fix ScioInterpreterTestCase
4387a5b [astroshim] Merge branch 'master' into jdbc-impersonation
47c463f [astroshim] update doc and html
d4eb178 [astroshim] fix docs
59aa9ff [astroshim] Merge branch 'master' into jdbc-impersonation
bf61afd [astroshim] fix testcase
5c0f5d7 [astroshim] rebase
79ba25b [astroshim] Merge branch 'master' into jdbc-impersonation
1f9c2c0 [astroshim] clean redundant code
a2f5687 [astroshim] fix impersonation
9962181 [astroshim] fix InterpreterOutput of PySparkInterpreterTest case
b55aceb [astroshim] Merge branch 'master' into jdbc-impersonation
24a8226 [astroshim] fix doc
086dfda [astroshim] fix testcase
34fe0a6 [astroshim] fix code for more simple.
fee7086 [astroshim] fix build error.
a305eca [astroshim] Merge branch 'master' into jdbc-impersonation
df80741 [astroshim] documentation for credential.
df1b1dc [astroshim] rebase and entity name convention.
63d6a1c [astroshim] change thrift version to 0.9.2
6573c1c [astroshim] change variable name
f311f34 [astroshim] fix typo
722e333 [astroshim] change testcase name
9161937 [astroshim] clean code
3dafdf0 [astroshim] add testcase
373d5f1 [astroshim] pass replName to Interpreter and use credential info for jdbc auth.
2016-11-24 09:17:01 -08:00
..
src [ZEPPELIN-1567] Let JDBC interpreter use user credential information. 2016-11-24 09:17:01 -08:00
pom.xml ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell 2016-11-08 07:20:21 -08:00
README.md ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell 2016-11-08 07:20:21 -08:00

Overview

Python interpreter for Apache Zeppelin

Architecture

Current interpreter implementation spawns new system python process through ProcessBuilder and re-directs it's stdin\strout to Zeppelin

Details

  • UnitTests

To run full suit of tests, including ones that depend on real Python interpreter AND external libraries installed (like Pandas, Pandasql, etc) do

mvn -Dpython.test.exclude='' test -pl python -am
  • Py4j support

Py4j enables Python programs to dynamically access Java objects in a JVM. It is required in order to use Zeppelin dynamic forms feature.

  • bootstrap process

Interpreter environment is setup with thex bootstrap.py It defines help() and z convenience functions

Dev prerequisites

  • Python 2 or 3 installed with py4j (0.9.2) and matplotlib (1.31 or later) installed on each

  • Tests only checks the interpreter logic and starts any Python process! Python process is mocked with a class that simply output it input.

  • Code wrote in bootstrap.py and bootstrap_input.py should always be Python 2 and 3 compliant.

  • Use PEP8 convention for python code.

Technical overview

  • When interpreter is starting it launches a python process inside a Java ProcessBuilder. Python is started with -i (interactive mode) and -u (unbuffered stdin, stdout and stderr) options. Thus the interpreter has a "sleeping" python process.

  • Interpreter sends command to python with a Java outputStreamWiter and read from an InputStreamReader. To know when stop reading stdout, interpreter sends print "*!?flush reader!?*"after each command and reads stdout until he receives back the *!?flush reader!?*.

  • When interpreter is starting, it sends some Python code (bootstrap.py and bootstrap_input.py) to initialize default behavior and functions (help(), z.input()...). bootstrap_input.py is sent only if py4j library is detected inside Python process.

  • Py4J python and java libraries is used to load Input zeppelin Java class into the python process (make java code with python code !). Therefore the interpreter can directly create Zeppelin input form inside the Python process (and eventually with some python variable already defined). JVM opens a random open port to be accessible from python process.

  • JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a kill SIGINT PID to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py

  • Matplotlib figures are displayed inline with the notebook automatically using a built-in backend for zeppelin in conjunction with a post-execute hook.

  • %python.sql support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed