zeppelin/python/src/main/resources/bootstrap.py
Alex Goodman 438dbca686 ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell
### What is this PR for?

This PR is the first of two major steps needed to improve matplotlib integration in Zeppelin (ZEPPELIN-1344). The latter, which is a plotting backend with fully interactive tools enabled, will be done afterwards in a separate PR. This PR specifically for automatically displaying output from calls to matplotlib plotting functions inline with each paragraph. Thanks to the addition of post-execute hooks (ZEPPELIN-1423), there is no need to call any `show()` function to display an inline plot, just like in Jupyter.
### What type of PR is it?

Improvement
### Todos

The main code has been written and anyone who reads this is encouraged to test it, but there are a few minor todos:
- [x] - Add unit tests
- [x] - Add documentation
- [x] - Add screenshot showing iterative plotting with angular mode
### What is the Jira issue?

[ZEPPELIN-1345](https://issues.apache.org/jira/browse/ZEPPELIN-1345)
### How should this be tested?

In a pyspark or python paragraph, enter and run

``` python
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
```

The plot should be displayed automatically without calling any `show()` function whatsoever. A special method called `configure_mpl()` can also be used to modify the inline plotting behavior. For example,

``` python
z.configure_mpl(close=False, angular=True)
plt.plot([1, 2, 3])
```

allows for iterative updates to the plot provided you have PY4J installed for your python installation (which of course is always the case if you use pypsark). To clarify, this feature only currently works with pyspark (not python as there are no `angularBind()` and `angularUnbind()` methods yet). Doing something like:

```
plt.plot([3, 2, 1])
```

will update the plot that was generated by the previous paragraph by leveraging Zeppelin's Angular Display System. However, by setting `close=False`, matplotlib will no longer automatically close figures so it is now up to the user to explicitly close each figure instance they create. There's quite a bit more options for `z.configure_mpl()`, but I will save that discussion for the documentation.
### Screenshots (if appropriate)
![img](http://i.imgur.com/e1xHKnV.gif)

### Questions:
- Does the licenses files need update? No
- Is there breaking changes for older versions? No
- Does this needs documentation? Yes

Author: Alex Goodman <agoodm@users.noreply.github.com>

Closes #1534 from agoodm/ZEPPELIN-1345 and squashes the following commits:

9ef6ff7 [Alex Goodman] Move mpl backend files to /interpreter
24f89c6 [Alex Goodman] Catch potential NullPointerExceptions from hook registry
bdb584e [Alex Goodman] Make sure expressions are printed when no plots are shown
22b6fe4 [Alex Goodman] Remove unused variable
d3d1aa0 [Alex Goodman] Fix CI test failure
c90d204 [Alex Goodman] Update spark.md
bcf0bf3 [Alex Goodman] Update python.md for new matplotlib integration
c9b65a5 [Alex Goodman] Add iterative plotting example image
8029a05 [Alex Goodman] Update python/README.md
f2d9e86 [Alex Goodman] Exclude tests are excluded in python/pom.xml
86b1c90 [Alex Goodman] Fix tutorial notebook not loading
c37b00f [Alex Goodman] Fix legend in tutorial notebook
a321d79 [Alex Goodman] Update python.md
82350e3 [Alex Goodman] Update matplotlib tutorial notebook
9792f97 [Alex Goodman] Add unit tests
8b9b973 [Alex Goodman] Fix NullPointerExceptions in unit tests
82135ad [Alex Goodman] Removed unused variable
f9c9498 [Alex Goodman] Added support for Angular Display System
edf750a [Alex Goodman] Add new matplotlib backend for python/pyspark interpreters
2016-11-08 07:20:21 -08:00

227 lines
7.8 KiB
Python

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# PYTHON 2 / 3 compatibility :
# bootstrap.py must be runnable with Python 2 or 3
import os
import sys
import signal
import base64
import warnings
from io import BytesIO
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
def intHandler(signum, frame): # Set the signal handler
print ("Paragraph interrupted")
raise KeyboardInterrupt()
signal.signal(signal.SIGINT, intHandler)
# set prompt as empty string so that java side don't need to remove the prompt.
sys.ps1=""
def help():
print("""%html
<h2>Python Interpreter help</h2>
<h3>Python 2 & 3 compatibility</h3>
<p>The interpreter is compatible with Python 2 & 3.<br/>
To change Python version,
change in the interpreter configuration the python to the
desired version (example : python=/usr/bin/python3)</p>
<h3>Python modules</h3>
<p>The interpreter can use all modules already installed
(with pip, easy_install, etc)</p>
<h3>Forms</h3>
You must install py4j in order to use
the form feature (pip install py4j)
<h4>Input form</h4>
<pre>print (z.input("f1","defaultValue"))</pre>
<h4>Selection form</h4>
<pre>print(z.select("f2", [("o1","1"), ("o2","2")],2))</pre>
<h4>Checkbox form</h4>
<pre> print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"])))</pre>')
<h3>Matplotlib graph</h3>
<div>The interpreter can display matplotlib graph with
the function z.show()</div>
<div> You need to already have matplotlib module installed
to use this functionality !</div><br/>
<pre>import matplotlib.pyplot as plt
plt.figure()
(.. ..)
z.show(plt)
plt.close()
</pre>
<div><br/> z.show function can take optional parameters
to adapt graph dimensions (width and height) and format (png or svg)</div>
<div><b>example </b>:
<pre>z.show(plt,width='50px
z.show(plt,height='150px', fmt='svg') </pre></div>
<h3>Pandas DataFrame</h3>
<div> You need to have Pandas module installed
to use this functionality (pip install pandas) !</div><br/>
<div>The interpreter can visualize Pandas DataFrame
with the function z.show()
<pre>
import pandas as pd
df = pd.read_csv("bank.csv", sep=";")
z.show(df)
</pre></div>
<h3>SQL over Pandas DataFrame</h3>
<div> You need to have Pandas&Pandasql modules installed
to use this functionality (pip install pandas pandasql) !</div><br/>
<div>Python interpreter group includes %sql interpreter that can query
Pandas DataFrames using SQL and visualize results using Zeppelin Table Display System
<pre>
%python
import pandas as pd
df = pd.read_csv("bank.csv", sep=";")
</pre>
<br />
<pre>
%python.sql
%sql
SELECT * from df LIMIT 5
</pre>
</div>
""")
class PyZeppelinContext(object):
""" If py4j is detected, these class will be override
with the implementation in bootstrap_input.py
"""
errorMsg = "You must install py4j Python module " \
"(pip install py4j) to use Zeppelin dynamic forms features"
def __init__(self):
self.max_result = 1000
self._displayhook = lambda *args: None
def input(self, name, defaultValue=""):
print(self.errorMsg)
def select(self, name, options, defaultValue=""):
print(self.errorMsg)
def checkbox(self, name, options, defaultChecked=[]):
print(self.errorMsg)
def show(self, p, **kwargs):
if hasattr(p, '__name__') and p.__name__ == "matplotlib.pyplot":
self.show_matplotlib(p, **kwargs)
elif type(p).__name__ == "DataFrame": # does not play well with sub-classes
# `isinstance(p, DataFrame)` would req `import pandas.core.frame.DataFrame`
# and so a dependency on pandas
self.show_dataframe(p, **kwargs)
elif hasattr(p, '__call__'):
p() #error reporting
def show_dataframe(self, df, **kwargs):
"""Pretty prints DF using Table Display System
"""
limit = len(df) > self.max_result
header_buf = StringIO("")
header_buf.write(str(df.columns[0]))
for col in df.columns[1:]:
header_buf.write("\t")
header_buf.write(str(col))
header_buf.write("\n")
body_buf = StringIO("")
rows = df.head(self.max_result).values if limit else df.values
for row in rows:
body_buf.write(str(row[0]))
for cell in row[1:]:
body_buf.write("\t")
body_buf.write(str(cell))
body_buf.write("\n")
body_buf.seek(0); header_buf.seek(0)
#TODO(bzz): fix it, so it shows red notice, as in Spark
print("%table " + header_buf.read() + body_buf.read()) # +
# ("\n<font color=red>Results are limited by {}.</font>" \
# .format(self.max_result) if limit else "")
#)
body_buf.close(); header_buf.close()
def show_matplotlib(self, p, fmt="png", width="auto", height="auto",
**kwargs):
"""Matplotlib show function
"""
if fmt == "png":
img = BytesIO()
p.savefig(img, format=fmt)
img_str = b"data:image/png;base64,"
img_str += base64.b64encode(img.getvalue().strip())
img_tag = "<img src={img} style='width={width};height:{height}'>"
# Decoding is necessary for Python 3 compability
img_str = img_str.decode("ascii")
img_str = img_tag.format(img=img_str, width=width, height=height)
elif fmt == "svg":
img = StringIO()
p.savefig(img, format=fmt)
img_str = img.getvalue()
else:
raise ValueError("fmt must be 'png' or 'svg'")
html = "%html <div style='width:{width};height:{height}'>{img}<div>"
print(html.format(width=width, height=height, img=img_str))
img.close()
def configure_mpl(self, **kwargs):
import mpl_config
mpl_config.configure(**kwargs)
def _setup_matplotlib(self):
# If we don't have matplotlib installed don't bother continuing
try:
import matplotlib
except ImportError:
pass
# Make sure custom backends are available in the PYTHONPATH
rootdir = os.environ.get('ZEPPELIN_HOME', os.getcwd())
mpl_path = os.path.join(rootdir, 'interpreter', 'lib', 'python')
if mpl_path not in sys.path:
sys.path.append(mpl_path)
# Finally check if backend exists, and if so configure as appropriate
try:
matplotlib.use('module://backend_zinline')
import backend_zinline
# Everything looks good so make config assuming that we are using
# an inline backend
self._displayhook = backend_zinline.displayhook
self.configure_mpl(width=600, height=400, dpi=72,
fontsize=10, interactive=True, format='png')
except ImportError:
# Fall back to Agg if no custom backend installed
matplotlib.use('Agg')
warnings.warn("Unable to load inline matplotlib backend, "
"falling back to Agg")
z = PyZeppelinContext()
z._setup_matplotlib()