zeppelin/bigquery
Tinkoff DWH 155a55b560 [ZEPPELIN-2403] interpreter property widgets
### What is this PR for?
I spoiled the previous PR #2251

Added widgets (string, text, url, password, url, checkbox) to properties of interpreters. Those are widgets for properties customization. Properties must have the ability to customize the display (for example password).

### What type of PR is it?
Feature

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-2403

### How should this be tested?
- remove conf/interpreter.json
- Try new form (create, edit) of interpreter settings

### Screenshots (if appropriate)
edit
![edit](https://cloud.githubusercontent.com/assets/25951039/25130228/e2a28060-245a-11e7-895a-d7c1571f885f.png)

view
![view](https://cloud.githubusercontent.com/assets/25951039/25130227/e2a10906-245a-11e7-9ea3-0bd070219f42.png)

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? no

Author: Tinkoff DWH <tinkoff.dwh@gmail.com>
Author: isys.mreshetov <m.reshetov@i-sys.ru>

Closes #2268 from tinkoff-dwh/ZEPPELIN-2403 and squashes the following commits:

75a10464 [isys.mreshetov] ZEPPELIN-2403 imports fix
7be8ddff [isys.mreshetov] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
585fc364 [isys.mreshetov] ZEPPELIN-2403 documentation fix
4b633993 [isys.mreshetov] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
726c1f31 [isys.mreshetov] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
b17dfb59 [isys.mreshetov] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
098fbd14 [Tinkoff DWH] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
a5f13272 [Tinkoff DWH] [ZEPPELIN-2403] checkstyle fix
fd25c467 [Tinkoff DWH] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
e35ff58f [Tinkoff DWH] [ZEPPELIN-2403] fix checkstyle
7c25b6db [Tinkoff DWH] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
10ce996a [Tinkoff DWH] [ZEPPELIN-2403] merge widget and type
ca1e2bf7 [Tinkoff DWH] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
99daca6d [Tinkoff DWH] [ZEPPELIN-2403] fix rest api test
f735c0a9 [Tinkoff DWH] [ZEPPELIN-2403] fix test
c6d24c4c [Tinkoff DWH] [ZEPPELIN-2403] converter for old settings to new (with widgets)
76a98083 [Tinkoff DWH] Merge remote-tracking branch 'origin/master' into ZEPPELIN-2403
b41e7a3f [Tinkoff DWH] ZEPPELIN-2403 checkstyle
637cb0a1 [Tinkoff DWH] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
e92713c7 [Tinkoff DWH] [ZEPPELIN-2403] generalized types, added new types
07160e00 [Tinkoff DWH] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
a495137f [Tinkoff DWH] ZEPPELIN-2403 eslint fix
fd8d2781 [Tinkoff DWH] Merge remote-tracking branch 'origin/master' into ZEPPELIN-2403_backup
4f271d9b [Tinkoff DWH] ZEPPELIN-2403  rename to widget  added new widgets  string,  number,  url
dd5d6c80 [Tinkoff DWH] ZEPPELIN-2403 did properties immutable, added new type 'checkbox'
14353b12 [Tinkoff DWH] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
12499ae1 [Tinkoff DWH] Merge remote-tracking branch 'upstream/master' into ZEPPELIN-2403
45f5f627 [Tinkoff DWH] ZEPPELIN-2403 added interpreter property types
2017-07-06 15:54:55 +09:00
..
src [ZEPPELIN-2403] interpreter property widgets 2017-07-06 15:54:55 +09:00
pom.xml Bump up version to 0.8.0-SNAPSHOT 2017-01-19 02:04:24 +09:00
README.md BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153] 2016-07-31 01:14:21 +09:00

Overview

BigQuery interpreter for Apache Zeppelin

Pre requisities

You can follow the instructions at Apache Zeppelin on Dataproc to bring up Zeppelin on Google dataproc. You could also install and bring up Zeppelin on Google compute Engine.

Unit Tests

BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.

If you like to run these tests manually, please follow the following steps:

Interpreter Configuration

Configure the following properties during Interpreter creation.

Name Default Value Description
zeppelin.bigquery.project_id Google Project Id
zeppelin.bigquery.wait_time 5000 Query Timeout in Milliseconds
zeppelin.bigquery.max_no_of_rows 100000 Max result set size

Connection

The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.

Google BigQuery API Javadoc

API Javadocs [Source] (http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar)

We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.

Enabling the BigQuery Interpreter

In a notebook, to enable the BigQuery interpreter, click the Gear icon and select bigquery.

Using the BigQuery Interpreter

In a paragraph, use %bigquery.sql to select the BigQuery interpreter and then input SQL statements against your datasets stored in BigQuery. You can use BigQuery SQL Reference to build your own SQL.

For Example, SQL to query for top 10 departure delays across airports using the flights public dataset

%bigquery.sql
SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
FROM [bigquery-samples:airline_ontime_data.flights]
group by departure_airport
order by 2 desc
limit 10

Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery

%bigquery.sql
SELECT
  package,
  COUNT(*) count
FROM (
  SELECT
    REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
    id
  FROM (
    SELECT
      SPLIT(content, '\n') line,
      id
    FROM
      [bigquery-public-data:github_repos.sample_contents]
    WHERE
      content CONTAINS 'import'
      AND sample_path LIKE '%.java'
    HAVING
      LEFT(line, 6)='import' )
  GROUP BY
    package,
    id )
GROUP BY
  1
ORDER BY
  count DESC
LIMIT
  40

Sample Screenshot

Zeppelin BigQuery