mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
Increase readability and delete useless <br> tag tutorial.md
This commit is contained in:
parent
e867df3a35
commit
c63e59fbf3
1 changed files with 13 additions and 15 deletions
|
|
@ -17,20 +17,20 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
### Zeppelin Tutorial
|
||||
## Zeppelin Tutorial
|
||||
|
||||
We will assume you have Zeppelin installed already. If that's not the case, see [Install](../install/install.html).
|
||||
This tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see [here](../install/install.html) first.
|
||||
|
||||
Zeppelin's current main backend processing engine is [Apache Spark](https://spark.apache.org). If you're new to the system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.
|
||||
Current main backend processing engine of Zeppelin is [Apache Spark](https://spark.apache.org). If you're new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.
|
||||
|
||||
<br />
|
||||
### Tutorial with Local File
|
||||
## Tutorial with Local File
|
||||
|
||||
#### Data Refine
|
||||
### 1. Data Refine
|
||||
|
||||
Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip).
|
||||
|
||||
First, to transform data from csv format into RDD of `Bank` objects, run following script. This will also remove header using `filter` function.
|
||||
First, to transform csv format data into RDD of `Bank` objects, run following script. This will also remove header using `filter` function.
|
||||
|
||||
```scala
|
||||
|
||||
|
|
@ -38,7 +38,7 @@ val bankText = sc.textFile("yourPath/bank/bank-full.csv")
|
|||
|
||||
case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)
|
||||
|
||||
// split each line, filter out header (starts with "age"), and map it into Bank case class
|
||||
// split each line, filter out header (starts with "age"), and map it into Bank case class
|
||||
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
|
||||
s=>Bank(s(0).toInt,
|
||||
s(1).replaceAll("\"", ""),
|
||||
|
|
@ -52,8 +52,7 @@ val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
|
|||
bank.toDF().registerTempTable("bank")
|
||||
```
|
||||
|
||||
<br />
|
||||
#### Data Retrieval
|
||||
### 2. Data Retrieval
|
||||
|
||||
Suppose we want to see age distribution from `bank`. To do this, run:
|
||||
|
||||
|
|
@ -74,9 +73,9 @@ Now we want to see age distribution with certain marital status and add combo bo
|
|||
```
|
||||
|
||||
<br />
|
||||
### Tutorial with Streaming Data
|
||||
## Tutorial with Streaming Data
|
||||
|
||||
#### Data Refine
|
||||
### 1. Data Refine
|
||||
|
||||
Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script.
|
||||
|
||||
|
|
@ -136,12 +135,11 @@ twt.print
|
|||
ssc.start()
|
||||
```
|
||||
|
||||
<br />
|
||||
#### Data Retrieval
|
||||
### 2. Data Retrieval
|
||||
|
||||
For each following script, every time you click run button you will see different result since it is based on real-time data.
|
||||
|
||||
Let's begin by extracting maximum 10 tweets which contain the word "girl".
|
||||
Let's begin by extracting maximum 10 tweets which contain the word **girl**.
|
||||
|
||||
```sql
|
||||
%sql select * from tweets where text like '%girl%' limit 10
|
||||
|
|
@ -154,7 +152,7 @@ This time suppose we want to see how many tweets have been created per sec durin
|
|||
```
|
||||
|
||||
|
||||
You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes(positive, negative, neutral) towards the parameter.
|
||||
You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter.
|
||||
|
||||
```scala
|
||||
def sentiment(s:String) : String = {
|
||||
|
|
|
|||
Loading…
Reference in a new issue