We will assume you have Zeppelin installed already. If that's not the case, see [Install](../install/install.html).
Zeppelin's current main backend processing engine is [Apache Spark](https://spark.apache.org). If you're new to the system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.
<br/>
### Tutorial with Local File
#### Data Refine
Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip).
First, to transform data from csv format into RDD of `Bank` objects, run following script. This will also remove header using `filter` function.
Suppose we want to see age distribution from `bank`. To do this, run:
```sql
%sql select age, count(1) from bank where age <30groupbyageorderbyage
```
You can make input box for setting age condition by replacing `30` with `${maxAge=30}`.
```sql
%sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age
```
Now we want to see age distribution with certain marital status and add combo box to select marital status. Run:
```sql
%sql select age, count(1) from bank where marital="${marital=single,single|divorced|married}" group by age order by age
```
<br/>
### Tutorial with Streaming Data
#### Data Refine
Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script.
This will create a RDD of `Tweet` objects and register these stream data as a table:
```scala
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.storage.StorageLevel
import scala.io.Source
import scala.collection.mutable.HashMap
import java.io.File
import org.apache.log4j.Logger
import org.apache.log4j.Level
import sys.process.stringSeqToProcess
/** Configures the Oauth Credentials for accessing Twitter */
For each following script, every time you click run button you will see different result since it is based on real-time data.
Let's begin by extracting maximum 10 tweets which contain the word "girl".
```sql
%sql select * from tweets where text like '%girl%' limit 10
```
This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run:
```sql
%sql select createdAt, count(1) from tweets group by createdAt order by createdAt
```
You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes(positive, negative, neutral) towards the parameter.