mirror of https://github.com/apache/zeppelin synced 2026-05-24 09:38:26 +00:00

Alexander Shoshin 0da08d1d72 [ZEPPELIN-1787] Add an example of Flink Notebook

### What is this PR for?
This PR will add an example of batch processing with Flink to Zeppelin tutorial notebooks. There are no any Flink notebooks in the tutorial at the moment.

### What type of PR is it?
Improvement

### What is the Jira issue?
[ZEPPELIN-1787](https://issues.apache.org/jira/browse/ZEPPELIN-1787)

### How should this be tested?
You should open `Using Flink for batch processing` notebook from the `Zeppelin Tutorial` folder and run all paragraphs one by one

### Questions:
* Does the licenses files need update? - **no**
* Is there breaking changes for older versions? - **no**
* Does this needs documentation? - **no**

Author: Alexander Shoshin <Alexander_Shoshin@epam.com>

Closes #1758 from AlexanderShoshin/ZEPPELIN-1787 and squashes the following commits:

83cbffb [Alexander Shoshin] remove localhost url
5255e17 [Alexander Shoshin] Merge branch 'master' into ZEPPELIN-1787
0b9df56 [Alexander Shoshin] add a link for this notebook to Zeppelin documentation
593c47d [Alexander Shoshin] convert notebook to 0.7.0 format
9013620 [Alexander Shoshin] convert notebook to 0.6.2 format
fe2a39e [Alexander Shoshin] add download instruction, change "wget" to "curl"
f64b60a [Alexander Shoshin] [ZEPPELIN-1787] Add an example of Flink Notebook

2017-01-12 12:06:01 +09:00

3.2 KiB

Raw Blame History

layout	title	description	group
page	Flink Interpreter for Apache Zeppelin	Apache Flink is an open source platform for distributed stream and batch data processing.	interpreter

{% include JB/setup %}

Flink interpreter for Apache Zeppelin

Overview

Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization.

How to start local Flink cluster, to test the interpreter

Zeppelin comes with pre-configured flink-local interpreter, which starts Flink in a local mode on your machine, so you do not need to install anything.

How to configure interpreter to point to Flink cluster

At the "Interpreters" menu, you have to create a new Flink interpreter and provide next properties:

property	value	Description
host	local	host name of running JobManager. 'local' runs flink in local mode (default)
port	6123	port of running JobManager

For more information about Flink configuration, you can find it here.

How to test it's working

You can find an example of Flink usage in the Zeppelin Tutorial folder or try the following word count example, by using the Zeppelin notebook from Till Rohrmann's presentation Interactive data analysis with Apache Flink for Apache Flink Meetup.

%sh
rm 10.txt.utf-8
wget http://www.gutenberg.org/ebooks/10.txt.utf-8

{% highlight scala %} %flink case class WordCount(word: String, frequency: Int) val bible:DataSet[String] = benv.readTextFile("10.txt.utf-8") val partialCounts: DataSet[WordCount] = bible.flatMap{ line => """\b\w+\b""".r.findAllIn(line).map(word => WordCount(word, 1)) // line.split(" ").map(word => WordCount(word, 1)) } val wordCounts = partialCounts.groupBy("word").reduce{ (left, right) => WordCount(left.word, left.frequency + right.frequency) } val result10 = wordCounts.first(10).collect() {% endhighlight %}

3.2 KiB Raw Blame History Unescape Escape