zeppelin/bigquery/README.md
Babu Prasad Elumalai 57c264da2d BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153]
### What is this PR for?
Google BigQuery is a popular no-ops datawarehouse. This commit will enable Apache Zeppelin users to perform BI and Analytics on their datasets in BigQuery.

### What type of PR is it?
Feature

### Todos
* Make bigquery interpreter appear in the interpreters section in the UI
* Build SQL completion
* Authorization of non-gcp

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-1153

### How should this be tested?
copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml
Add org.apache.zeppelin.bigquery.bigQueryInterpreter to property zeppelin.interpreters in zeppelin-site.xml
Start Zeppelin
Add BigQuery Interpreter with your project ID
Create new note with %bsql.sql and run your SQL against public datasets in bigquery.

### Screenshots (if appropriate)
![screenshot from 2016-07-12 14 27 30](https://cloud.githubusercontent.com/assets/4242273/16785302/31b104e2-4842-11e6-87c0-b79763dd85c0.png)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Babu Prasad Elumalai <babupe@google.com>
Author: babupe <babupe@google.com>
Author: Alexander Bezzubov <bzz@apache.org>

Closes #1170 from babupe/babupe-bigquery and squashes the following commits:

ffed801 [Babu Prasad Elumalai] pushing BQ Exception to logs and Interpreter error output
d3c2316 [babupe] Merge pull request #2 from bzz/babupe-add-auth-docs
64525b8 [Alexander Bezzubov] Fix typos in docs
03a777f [Alexander Bezzubov] add docs for BigQuery auth outside of GCE
fcab6b7 [babupe] Merge pull request #1 from bzz/babupe-final
6a95333 [Alexander Bezzubov] Rename Apach2.0 license for google's code to adhere naming conventions
7d4f40b [Alexander Bezzubov] Add exidentaly removed licenses due to merge conflict
3be1912 [Babu Prasad Elumalai] New changes
41e076e [Babu Prasad Elumalai] Fixed formatting with readme file
97874a4 [Babu Prasad Elumalai] Pushing cropped screenshots
64affbb [babupe] Added cropped interpreter screenshot
4a1d29c [Babu Prasad Elumalai] Removed unnecessary dependencies in pom.xml
e520b7b [Babu Prasad Elumalai] Exclude constants.json file for rat plugin since its static config file
69cb724 [Babu Prasad Elumalai] Fixed license header and added manual unit test documentation
bbf26cc [Babu Prasad Elumalai] Added path and specific wording
4a3153f [Babu Prasad Elumalai] removed bad package from import
d0c8e01 [Babu Prasad Elumalai] Added technical description to bigquery.md
b6d181c [Babu Prasad Elumalai] Trying to add screenshot in README
569757f [Babu Prasad Elumalai] Incorporated feedback
764385c [Babu Prasad Elumalai] Interpreter modification, License, doc changes
d85abd2 [Babu Prasad Elumalai] Modified code and license
17f6d89 [Babu Prasad Elumalai] ZEPPELIN-1153 comments committed
8fa647b [Babu Prasad Elumalai] BigQuery Interpreter for Apazhe Zeppelin
22e3487 [babupe] Update LICENSE
e88b017 [babupe] Created a new license file
d90e10f [babupe] Removed BigQuery from notice
aa52553 [Babu Prasad Elumalai] Merge branch 'master' of https://github.com/apache/zeppelin
ae096d2 [Babu Prasad Elumalai] License changes
20962d2 [Babu Prasad Elumalai] Pushing license changes
3d5f8e7 [Babu Prasad Elumalai] Modified license header
5a2e674 [Babu Prasad Elumalai] Added license info for Jackson library and added BQ API source
4db74c1 [Babu Prasad Elumalai] Adding license stuff
31c373f [Babu Prasad Elumalai] Fixed formatting with readme file
287744c [Babu Prasad Elumalai] Merge branch 'babupe-bigquery' of https://github.com/babupe/zeppelin into babupe-bigquery
f318b20 [Babu Prasad Elumalai] Pushing cropped screenshots
17fd4e8 [babupe] Added cropped interpreter screenshot
f872aa0 [Babu Prasad Elumalai] Removed unnecessary dependencies in pom.xml
5983e36 [Babu Prasad Elumalai] Exclude constants.json file for rat plugin since its static config file
11e88dc [Babu Prasad Elumalai] Replaced license header with formatting
4b82abd [Babu Prasad Elumalai] Fixed license header and added manual unit test documentation
87f5efe [Babu Prasad Elumalai] Added path and specific wording
6132d78 [Babu Prasad Elumalai] Fixing License and skipping failing tests
2254a49 [Babu Prasad Elumalai] removed bad package from import
73e3f6d [Babu Prasad Elumalai] Added technical description to bigquery.md
089820b [Babu Prasad Elumalai] Trying to add screenshot in README
a00b48e [Babu Prasad Elumalai] Incorporated feedback
17846f1 [Babu Prasad Elumalai] Interpreter modification, License, doc changes
50c41fc [Babu Prasad Elumalai] Modified code and license
75d8ee6 [Babu Prasad Elumalai] ZEPPELIN-1153 comments committed
2a2bedc [Babu Prasad Elumalai] BigQuery Interpreter for Apazhe Zeppelin
2016-07-31 01:14:21 +09:00

3.6 KiB

Overview

BigQuery interpreter for Apache Zeppelin

Pre requisities

You can follow the instructions at Apache Zeppelin on Dataproc to bring up Zeppelin on Google dataproc. You could also install and bring up Zeppelin on Google compute Engine.

Unit Tests

BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.

If you like to run these tests manually, please follow the following steps:

Interpreter Configuration

Configure the following properties during Interpreter creation.

Name Default Value Description
zeppelin.bigquery.project_id Google Project Id
zeppelin.bigquery.wait_time 5000 Query Timeout in Milliseconds
zeppelin.bigquery.max_no_of_rows 100000 Max result set size

Connection

The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.

Google BigQuery API Javadoc

API Javadocs [Source] (http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar)

We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.

Enabling the BigQuery Interpreter

In a notebook, to enable the BigQuery interpreter, click the Gear icon and select bigquery.

Using the BigQuery Interpreter

In a paragraph, use %bigquery.sql to select the BigQuery interpreter and then input SQL statements against your datasets stored in BigQuery. You can use BigQuery SQL Reference to build your own SQL.

For Example, SQL to query for top 10 departure delays across airports using the flights public dataset

%bigquery.sql
SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
FROM [bigquery-samples:airline_ontime_data.flights]
group by departure_airport
order by 2 desc
limit 10

Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery

%bigquery.sql
SELECT
  package,
  COUNT(*) count
FROM (
  SELECT
    REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
    id
  FROM (
    SELECT
      SPLIT(content, '\n') line,
      id
    FROM
      [bigquery-public-data:github_repos.sample_contents]
    WHERE
      content CONTAINS 'import'
      AND sample_path LIKE '%.java'
    HAVING
      LEFT(line, 6)='import' )
  GROUP BY
    package,
    id )
GROUP BY
  1
ORDER BY
  count DESC
LIMIT
  40

Sample Screenshot

Zeppelin BigQuery