mirror of https://github.com/apache/zeppelin synced 2026-05-24 09:38:26 +00:00

History

Tinkoff DWH 4d398ef2a6 [ZEPPELIN-2297] improvements to jdbc autocompleter ### What is this PR for? PR contains some improvements for completion (JDBC Interpreter): - types of completion - display of long values - refactoring of search of completions - uniqness of completions with type `keyword` - updating data in completer by pressing `Ctrl + .` - setting the schema filter to generate completions - fix highlighting code when used not default data source ### What type of PR is it? Improvement ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-2297 ### How should this be tested? try to work with new completer ### Screenshots (if appropriate) 1. Types of completion ![1](https://cloud.githubusercontent.com/assets/25951039/24449367/758eeeac-1490-11e7-863f-bf1b313a3f4d.png) 2. Display of long values before ![2297_before_long_caption](https://cloud.githubusercontent.com/assets/25951039/24449397/8ecd3072-1490-11e7-8fd4-415424ef337e.gif) after ![2297_after_long_caption](https://cloud.githubusercontent.com/assets/25951039/24449413/9c7a36b6-1490-11e7-9d7c-cbbdac71cbe7.gif) 3. Refactoring of search of completions. Updating data in completer by pressing `Ctrl + .` before ![2297_before_refactoring_search](https://cloud.githubusercontent.com/assets/25951039/24449463/c1801214-1490-11e7-84a8-25c887b68d65.gif) after ![2297_after_refactoring_search](https://cloud.githubusercontent.com/assets/25951039/24449567/1079bdc0-1491-11e7-8409-5187aeceb428.gif) 4. uniqness of completions with type keyword before ![2297_before_uniq](https://cloud.githubusercontent.com/assets/25951039/24449615/4e20c8d0-1491-11e7-94cc-c86aab886c53.gif) after ![2297_after_uniq](https://cloud.githubusercontent.com/assets/25951039/24449635/5cf59aca-1491-11e7-8ee1-31ea3cdacb3e.gif) 5. fix highlighting code when used not default data source before ![2297_before_inrpret_name](https://cloud.githubusercontent.com/assets/25951039/24449730/b6c8d62a-1491-11e7-8dc3-39fa6975c8c3.gif) after ![2297_after_inrpret_name](https://cloud.githubusercontent.com/assets/25951039/24449738/baf63e18-1491-11e7-8711-12557a674212.gif) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: Tinkoff DWH <tinkoff.dwh@gmail.com> Closes #2203 from tinkoff-dwh/ZEPPELIN-2297 and squashes the following commits: `b86b57a` [Tinkoff DWH] [ZEPPELIN-2297] small fix to compute caption `8552049` [Tinkoff DWH] [ZEPPELIN-2297] schema filters `5308f1e` [Tinkoff DWH] [ZEPPELIN-2297] updating completions `ef6c9cb` [Tinkoff DWH] Merge remote-tracking branch 'origin/ZEPPELIN-2297' into ZEPPELIN-2297 `1e05a68` [Tinkoff DWH] [ZEPPELIN-2297] fix uniqueness keywords `ec3cd3b` [Tinkoff DWH] [ZEPPELIN-2297] fix uniqueness keywords `2b58cc5` [Tinkoff DWH] [ZEPPELIN-2297] refactoring search completions `7b5835d` [Tinkoff DWH] [ZEPPELIN-2297] compute caption of copletion `1c74384` [Tinkoff DWH] [ZEPPELIN-2297] add type of completion		2017-04-17 14:18:32 +09:00
..
src	[ZEPPELIN-2297] improvements to jdbc autocompleter	2017-04-17 14:18:32 +09:00
pom.xml	Bump up version to 0.8.0-SNAPSHOT	2017-01-19 02:04:24 +09:00
README.md	BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153]	2016-07-31 01:14:21 +09:00

README.md

Overview

BigQuery interpreter for Apache Zeppelin

Pre requisities

You can follow the instructions at Apache Zeppelin on Dataproc to bring up Zeppelin on Google dataproc. You could also install and bring up Zeppelin on Google compute Engine.

Unit Tests

BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.

If you like to run these tests manually, please follow the following steps:

Create a new project
Create a Google Compute Engine instance
Copy the project ID that you created and add it to the property "projectId" in resources/constants.json
Run the command mvn -Dbigquery.text.exclude='' test -pl bigquery -am

Interpreter Configuration

Configure the following properties during Interpreter creation.

Name	Default Value	Description
zeppelin.bigquery.project_id		Google Project Id
zeppelin.bigquery.wait_time	5000	Query Timeout in Milliseconds
zeppelin.bigquery.max_no_of_rows	100000	Max result set size

Connection

The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.

Google BigQuery API Javadoc

API Javadocs [Source] (http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar)

We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.

Enabling the BigQuery Interpreter

In a notebook, to enable the BigQuery interpreter, click the Gear icon and select bigquery.

Using the BigQuery Interpreter

In a paragraph, use %bigquery.sql to select the BigQuery interpreter and then input SQL statements against your datasets stored in BigQuery. You can use BigQuery SQL Reference to build your own SQL.

For Example, SQL to query for top 10 departure delays across airports using the flights public dataset

%bigquery.sql
SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
FROM [bigquery-samples:airline_ontime_data.flights]
group by departure_airport
order by 2 desc
limit 10

Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery

%bigquery.sql
SELECT
  package,
  COUNT(*) count
FROM (
  SELECT
    REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
    id
  FROM (
    SELECT
      SPLIT(content, '\n') line,
      id
    FROM
      [bigquery-public-data:github_repos.sample_contents]
    WHERE
      content CONTAINS 'import'
      AND sample_path LIKE '%.java'
    HAVING
      LEFT(line, 6)='import' )
  GROUP BY
    package,
    id )
GROUP BY
  1
ORDER BY
  count DESC
LIMIT
  40