Increase readability and delete useless <br> tag tutorial.md

This commit is contained in:
Ryu Ah young 2015-12-29 02:04:41 -08:00
parent e867df3a35
commit c63e59fbf3

View file

@ -17,20 +17,20 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Zeppelin Tutorial
## Zeppelin Tutorial
We will assume you have Zeppelin installed already. If that's not the case, see [Install](../install/install.html).
This tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see [here](../install/install.html) first.
Zeppelin's current main backend processing engine is [Apache Spark](https://spark.apache.org). If you're new to the system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.
Current main backend processing engine of Zeppelin is [Apache Spark](https://spark.apache.org). If you're new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.
<br />
### Tutorial with Local File
## Tutorial with Local File
#### Data Refine
### 1. Data Refine
Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip).
First, to transform data from csv format into RDD of `Bank` objects, run following script. This will also remove header using `filter` function.
First, to transform csv format data into RDD of `Bank` objects, run following script. This will also remove header using `filter` function.
```scala
@ -38,7 +38,7 @@ val bankText = sc.textFile("yourPath/bank/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)
// split each line, filter out header (starts with "age"), and map it into Bank case class
// split each line, filter out header (starts with "age"), and map it into Bank case class
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
@ -52,8 +52,7 @@ val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
bank.toDF().registerTempTable("bank")
```
<br />
#### Data Retrieval
### 2. Data Retrieval
Suppose we want to see age distribution from `bank`. To do this, run:
@ -74,9 +73,9 @@ Now we want to see age distribution with certain marital status and add combo bo
```
<br />
### Tutorial with Streaming Data
## Tutorial with Streaming Data
#### Data Refine
### 1. Data Refine
Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script.
@ -136,12 +135,11 @@ twt.print
ssc.start()
```
<br />
#### Data Retrieval
### 2. Data Retrieval
For each following script, every time you click run button you will see different result since it is based on real-time data.
Let's begin by extracting maximum 10 tweets which contain the word "girl".
Let's begin by extracting maximum 10 tweets which contain the word **girl**.
```sql
%sql select * from tweets where text like '%girl%' limit 10
@ -154,7 +152,7 @@ This time suppose we want to see how many tweets have been created per sec durin
```
You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes(positive, negative, neutral) towards the parameter.
You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter.
```scala
def sentiment(s:String) : String = {