mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
### What is this PR for? Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support ### What type of PR is it? Improvement ### TODOs * [x] add new interpreter `%python.sql` * [x] add test * [x] make Python-dependant tests, excluded from CI * PythonInterpreterWithPythonInstalledTest * PythonPandasSqlInterpreterTest * run manually by `mvn -Dpython.test.exclude='' test -pl python -am` * [x] add docs `%python.sql` * [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed * [x] after #747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles ### What is the Jira issue? [ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115) ### How should this be tested? `mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run - Given the DataFrame i.e ``` %python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") ``` - SQL query it like ``` %python.sql SELECT * FROM rates LIMIT 10 ``` ### Screenshots (if appropriate)  ### Questions: * Does the licenses files need update? No, no dependencies were included in source or binary release * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes #1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits:0f2f852[Alexander Bezzubov] Fail SQL gracefully if no python dependencies installedaca2bdf[Alexander Bezzubov] Fix typos in docs ⚡158ba6a[Alexander Bezzubov] Remove third-party dependant test from CI5fe46fc[Alexander Bezzubov] Update Python Matplotlib notebook example72884c8[Alexander Bezzubov] Add docs for %python.sql featuree931dc4[Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable76bbb44[Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreterf6ca1eb[Alexander Bezzubov] Add %python.sql to interpreter menue11ba490[Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
28 lines
No EOL
1.2 KiB
Python
28 lines
No EOL
1.2 KiB
Python
# Licensed to the Apache Software Foundation (ASF) under one or more
|
|
# contributor license agreements. See the NOTICE file distributed with
|
|
# this work for additional information regarding copyright ownership.
|
|
# The ASF licenses this file to You under the Apache License, Version 2.0
|
|
# (the "License"); you may not use this file except in compliance with
|
|
# the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
|
|
# Setup SQL over Pandas DataFrames
|
|
# It requires next dependencies to be installed:
|
|
# - pandas
|
|
# - pandasql
|
|
|
|
from __future__ import print_function
|
|
|
|
try:
|
|
from pandasql import sqldf
|
|
pysqldf = lambda q: sqldf(q, globals())
|
|
except ImportError:
|
|
pysqldf = lambda q: print("Can not run SQL over Pandas DataFrame" +
|
|
"Make sure 'pandas' and 'pandasql' libraries are installed") |