### What is this PR for? Google BigQuery is a popular no-ops datawarehouse. This commit will enable Apache Zeppelin users to perform BI and Analytics on their datasets in BigQuery. ### What type of PR is it? Feature ### Todos * Make bigquery interpreter appear in the interpreters section in the UI * Build SQL completion * Authorization of non-gcp ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1153 ### How should this be tested? copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml Add org.apache.zeppelin.bigquery.bigQueryInterpreter to property zeppelin.interpreters in zeppelin-site.xml Start Zeppelin Add BigQuery Interpreter with your project ID Create new note with %bsql.sql and run your SQL against public datasets in bigquery. ### Screenshots (if appropriate)  ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Babu Prasad Elumalai <babupe@google.com> Author: babupe <babupe@google.com> Author: Alexander Bezzubov <bzz@apache.org> Closes #1170 from babupe/babupe-bigquery and squashes the following commits:ffed801[Babu Prasad Elumalai] pushing BQ Exception to logs and Interpreter error outputd3c2316[babupe] Merge pull request #2 from bzz/babupe-add-auth-docs64525b8[Alexander Bezzubov] Fix typos in docs03a777f[Alexander Bezzubov] add docs for BigQuery auth outside of GCEfcab6b7[babupe] Merge pull request #1 from bzz/babupe-final6a95333[Alexander Bezzubov] Rename Apach2.0 license for google's code to adhere naming conventions7d4f40b[Alexander Bezzubov] Add exidentaly removed licenses due to merge conflict3be1912[Babu Prasad Elumalai] New changes41e076e[Babu Prasad Elumalai] Fixed formatting with readme file97874a4[Babu Prasad Elumalai] Pushing cropped screenshots64affbb[babupe] Added cropped interpreter screenshot4a1d29c[Babu Prasad Elumalai] Removed unnecessary dependencies in pom.xmle520b7b[Babu Prasad Elumalai] Exclude constants.json file for rat plugin since its static config file69cb724[Babu Prasad Elumalai] Fixed license header and added manual unit test documentationbbf26cc[Babu Prasad Elumalai] Added path and specific wording4a3153f[Babu Prasad Elumalai] removed bad package from importd0c8e01[Babu Prasad Elumalai] Added technical description to bigquery.mdb6d181c[Babu Prasad Elumalai] Trying to add screenshot in README569757f[Babu Prasad Elumalai] Incorporated feedback764385c[Babu Prasad Elumalai] Interpreter modification, License, doc changesd85abd2[Babu Prasad Elumalai] Modified code and license17f6d89[Babu Prasad Elumalai] ZEPPELIN-1153 comments committed8fa647b[Babu Prasad Elumalai] BigQuery Interpreter for Apazhe Zeppelin22e3487[babupe] Update LICENSEe88b017[babupe] Created a new license filed90e10f[babupe] Removed BigQuery from noticeaa52553[Babu Prasad Elumalai] Merge branch 'master' of https://github.com/apache/zeppelinae096d2[Babu Prasad Elumalai] License changes20962d2[Babu Prasad Elumalai] Pushing license changes3d5f8e7[Babu Prasad Elumalai] Modified license header5a2e674[Babu Prasad Elumalai] Added license info for Jackson library and added BQ API source4db74c1[Babu Prasad Elumalai] Adding license stuff31c373f[Babu Prasad Elumalai] Fixed formatting with readme file287744c[Babu Prasad Elumalai] Merge branch 'babupe-bigquery' of https://github.com/babupe/zeppelin into babupe-bigqueryf318b20[Babu Prasad Elumalai] Pushing cropped screenshots17fd4e8[babupe] Added cropped interpreter screenshotf872aa0[Babu Prasad Elumalai] Removed unnecessary dependencies in pom.xml5983e36[Babu Prasad Elumalai] Exclude constants.json file for rat plugin since its static config file11e88dc[Babu Prasad Elumalai] Replaced license header with formatting4b82abd[Babu Prasad Elumalai] Fixed license header and added manual unit test documentation87f5efe[Babu Prasad Elumalai] Added path and specific wording6132d78[Babu Prasad Elumalai] Fixing License and skipping failing tests2254a49[Babu Prasad Elumalai] removed bad package from import73e3f6d[Babu Prasad Elumalai] Added technical description to bigquery.md089820b[Babu Prasad Elumalai] Trying to add screenshot in READMEa00b48e[Babu Prasad Elumalai] Incorporated feedback17846f1[Babu Prasad Elumalai] Interpreter modification, License, doc changes50c41fc[Babu Prasad Elumalai] Modified code and license75d8ee6[Babu Prasad Elumalai] ZEPPELIN-1153 comments committed2a2bedc[Babu Prasad Elumalai] BigQuery Interpreter for Apazhe Zeppelin
3.6 KiB
Overview
BigQuery interpreter for Apache Zeppelin
Pre requisities
You can follow the instructions at Apache Zeppelin on Dataproc to bring up Zeppelin on Google dataproc. You could also install and bring up Zeppelin on Google compute Engine.
Unit Tests
BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.
If you like to run these tests manually, please follow the following steps:
- Create a new project
- Create a Google Compute Engine instance
- Copy the project ID that you created and add it to the property "projectId" in
resources/constants.json - Run the command mvn -Dbigquery.text.exclude='' test -pl bigquery -am
Interpreter Configuration
Configure the following properties during Interpreter creation.
| Name | Default Value | Description |
|---|---|---|
| zeppelin.bigquery.project_id | Google Project Id | |
| zeppelin.bigquery.wait_time | 5000 | Query Timeout in Milliseconds |
| zeppelin.bigquery.max_no_of_rows | 100000 | Max result set size |
Connection
The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.
Google BigQuery API Javadoc
API Javadocs [Source] (http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar)
We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.
Enabling the BigQuery Interpreter
In a notebook, to enable the BigQuery interpreter, click the Gear icon and select bigquery.
Using the BigQuery Interpreter
In a paragraph, use %bigquery.sql to select the BigQuery interpreter and then input SQL statements against your datasets stored in BigQuery.
You can use BigQuery SQL Reference to build your own SQL.
For Example, SQL to query for top 10 departure delays across airports using the flights public dataset
%bigquery.sql
SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
FROM [bigquery-samples:airline_ontime_data.flights]
group by departure_airport
order by 2 desc
limit 10
Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery
%bigquery.sql
SELECT
package,
COUNT(*) count
FROM (
SELECT
REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
id
FROM (
SELECT
SPLIT(content, '\n') line,
id
FROM
[bigquery-public-data:github_repos.sample_contents]
WHERE
content CONTAINS 'import'
AND sample_path LIKE '%.java'
HAVING
LEFT(line, 6)='import' )
GROUP BY
package,
id )
GROUP BY
1
ORDER BY
count DESC
LIMIT
40
