mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
### What is this PR for? As more and more document pages are added, it's really hard to find specific pages. So I added searching feature to Zeppelin documentation site([jekyll](https://jekyllrb.com/) based site) using [lunr.js](http://lunrjs.com/). - **How does it work?** I created [`search_data.json`](6e02423f54/docs/search_data.json) which is used for docs info template. `lunr.js` combines all of the text from all of the docs in `docs/` into `_site/search_data.json`. It looks like below.  All the info are comes from [Jekyll YAML front matter](https://jekyllrb.com/docs/frontmatter/) variables. (i.e. title, group, description.. that's why I rewrote all docs' title and description.) [search.js](6e02423f54/docs/assets/themes/zeppelin/js/search.js) will do this job using this data! ### What type of PR is it? Improvement & Feature ### Todos * [x] - Keep consistency for all docs pages' `Title` * [x] - Add some overview sentences to all docs pages' `Description` section (this will be used as the result preview) * [x] - Add apache license header to all docs page (some pages are missing the license header currently) * [x] - Add LICENSE for `lunr.min.js` ### What is the Jira issue? [ZEPPELIN-1219](https://issues.apache.org/jira/browse/ZEPPELIN-1219) ### How should this be tested? 1. Apply this patch and build `ZEPPELIN_HOME/docs` dir -> please see [docs/README.md#build-documentation](https://github.com/apache/zeppelin/tree/master/docs#build-documentation) 2. Click `search` icon in navbar and go to `search.html` page 3. Type anything you want to search in the search bar (i.e. type `python`, `spark`, `dynamic` ... ) ### Screenshots (if appropriate)   ### Questions: * Does the licenses files need update? Yes, for `lunr.min.js` * Is there breaking changes for older versions? no * Does this needs documentation? no Author: AhyoungRyu <fbdkdud93@hanmail.net> Closes #1266 from AhyoungRyu/ZEPPELIN-1219 and squashes the following commits:7ec8854[AhyoungRyu] Modify 'no result' sentence91b71a7[AhyoungRyu] Remove Apache license header since JSON doesn't allow comment34afd5d[AhyoungRyu] Add Apache license header to search_data.json6784282[AhyoungRyu] Minor search page UI update0389d28[AhyoungRyu] Make index.md not to be searched9f1ba42[AhyoungRyu] Disable enterkey press & change iconbd4956a[AhyoungRyu] Add docs.js & search.js to exclude list in pom.xml624b051[AhyoungRyu] Add Apache license header to search.js1381152[AhyoungRyu] Fix search result skipping issue6e775f5[AhyoungRyu] Make pleasecontribute.md not to be searchedee11136[AhyoungRyu] Fix some typosfa01299[AhyoungRyu] Refine 'description' in some docs as @bzz suggestedda0cff9[AhyoungRyu] Exclude lunr.min.js36ba7f1[AhyoungRyu] Add lunr.min.js license infof6a05a6[AhyoungRyu] Apply css style for the search results68eb997[AhyoungRyu] Attach 'Apache Zeppelin ZEPPELIN_VERSION Documentation: ' to titled908c37[AhyoungRyu] Add searching pagea951fa6[AhyoungRyu] Add search icon to navbar0688a79[AhyoungRyu] Keep consistency all docs' front matter for the right search result040f532[AhyoungRyu] Add template for storing docs info based on jekyll front matter0705bd6[AhyoungRyu] Add js files: lunr.min.js & search.js
127 lines
4.8 KiB
Markdown
127 lines
4.8 KiB
Markdown
---
|
|
layout: page
|
|
title: "BigQuery Interpreter for Apache Zeppelin"
|
|
description: "BigQuery is a highly scalable no-ops data warehouse in the Google Cloud Platform."
|
|
group: interpreter
|
|
---
|
|
<!--
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
-->
|
|
{% include JB/setup %}
|
|
|
|
# BigQuery Interpreter for Apache Zeppelin
|
|
|
|
<div id="toc"></div>
|
|
|
|
## Overview
|
|
[BigQuery](https://cloud.google.com/bigquery/what-is-bigquery) is a highly scalable no-ops data warehouse in the Google Cloud Platform. Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery solves this problem by enabling super-fast SQL queries against append-only tables using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
|
|
|
|
## Configuration
|
|
<table class="table-configuration">
|
|
<tr>
|
|
<th>Name</th>
|
|
<th>Default Value</th>
|
|
<th>Description</th>
|
|
</tr>
|
|
<tr>
|
|
<td>zeppelin.bigquery.project_id</td>
|
|
<td> </td>
|
|
<td>Google Project Id</td>
|
|
</tr>
|
|
<tr>
|
|
<td>zeppelin.bigquery.wait_time</td>
|
|
<td>5000</td>
|
|
<td>Query Timeout in Milliseconds</td>
|
|
</tr>
|
|
<tr>
|
|
<td>zeppelin.bigquery.max_no_of_rows</td>
|
|
<td>100000</td>
|
|
<td>Max result set size</td>
|
|
</tr>
|
|
</table>
|
|
|
|
|
|
## BigQuery API
|
|
Zeppelin is built against BigQuery API version v2-rev265-1.21.0 - [API Javadocs](https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/)
|
|
|
|
## Enabling the BigQuery Interpreter
|
|
|
|
In a notebook, to enable the **BigQuery** interpreter, click the **Gear** icon and select **bigquery**.
|
|
|
|
### Setup service account credentials
|
|
|
|
In order to run BigQuery interpreter outside of Google Cloud Engine you need to provide authentication credentials,
|
|
by [following this instructions](https://developers.google.com/identity/protocols/application-default-credentials):
|
|
|
|
- Go to the [API Console Credentials page](https://console.developers.google.com/project/_/apis/credentials)
|
|
- From the project drop-down, select your project.
|
|
- On the `Credentials` page, select the `Create credentials` drop-down, then select `Service account key`.
|
|
- From the Service account drop-down, select an existing service account or create a new one.
|
|
- For `Key type`, select the `JSON` key option, then select `Create`. The file automatically downloads to your computer.
|
|
- Put the `*.json` file you just downloaded in a directory of your choosing. This directory must be private (you can't let anyone get access to this), but accessible to your Zeppelin instance.
|
|
- Set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` to the path of the JSON file downloaded.
|
|
* either though GUI: in interpreter configuration page property names in CAPITAL_CASE set up env vars
|
|
* or though `zeppelin-env.sh`: just add it to the end of the file.
|
|
|
|
## Using the BigQuery Interpreter
|
|
|
|
In a paragraph, use `%bigquery.sql` to select the **BigQuery** interpreter and then input SQL statements against your datasets stored in BigQuery.
|
|
You can use [BigQuery SQL Reference](https://cloud.google.com/bigquery/query-reference) to build your own SQL.
|
|
|
|
For Example, SQL to query for top 10 departure delays across airports using the flights public dataset
|
|
|
|
```bash
|
|
%bigquery.sql
|
|
SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
|
|
FROM [bigquery-samples:airline_ontime_data.flights]
|
|
group by departure_airport
|
|
order by 2 desc
|
|
limit 10
|
|
```
|
|
|
|
Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery
|
|
|
|
```bash
|
|
%bigquery.sql
|
|
SELECT
|
|
package,
|
|
COUNT(*) count
|
|
FROM (
|
|
SELECT
|
|
REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
|
|
id
|
|
FROM (
|
|
SELECT
|
|
SPLIT(content, '\n') line,
|
|
id
|
|
FROM
|
|
[bigquery-public-data:github_repos.sample_contents]
|
|
WHERE
|
|
content CONTAINS 'import'
|
|
AND sample_path LIKE '%.java'
|
|
HAVING
|
|
LEFT(line, 6)='import' )
|
|
GROUP BY
|
|
package,
|
|
id )
|
|
GROUP BY
|
|
1
|
|
ORDER BY
|
|
count DESC
|
|
LIMIT
|
|
40
|
|
```
|
|
|
|
## Technical description
|
|
|
|
For in-depth technical details on current implementation please refer to [bigquery/README.md](https://github.com/apache/zeppelin/blob/master/bigquery/README.md).
|