mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
Add SparRInterpreter implementation
This commit is contained in:
parent
617eb947bd
commit
15375ebccb
18 changed files with 985 additions and 200 deletions
264
README.md
264
README.md
|
|
@ -1,234 +1,102 @@
|
|||
#Zeppelin
|
||||
# Apache Zeppelin R
|
||||
|
||||
**Documentation:** [User Guide](http://zeppelin.incubator.apache.org/docs/latest/index.html)<br/>
|
||||
**Mailing Lists:** [User and Dev mailing list](http://zeppelin.incubator.apache.org/community.html)<br/>
|
||||
**Continuous Integration:** [](https://travis-ci.org/apache/incubator-zeppelin) <br/>
|
||||
**Contributing:** [Contribution Guide](https://github.com/apache/incubator-zeppelin/blob/master/CONTRIBUTING.md)<br/>
|
||||
**Issue Tracker:** [Jira](https://issues.apache.org/jira/browse/ZEPPELIN)<br/>
|
||||
**License:** [Apache 2.0](https://github.com/apache/incubator-zeppelin/blob/master/LICENSE)
|
||||
This adds [R](http://cran.r-project.org) interpeter to the [Apache Zeppelin notebook](http://zeppelin.incubator.apache.org).
|
||||
|
||||
It supports:
|
||||
|
||||
**Zeppelin**, a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.
|
||||
+ R code.
|
||||
+ SparkR code.
|
||||
+ Cross paragraph R variables.
|
||||
+ Scala to R binding (passing basic Scala data structure to R).
|
||||
+ R to Scala binding (passing basic R data structure to Scala).
|
||||
+ R plot (ggplot2...).
|
||||
|
||||
Core feature:
|
||||
* Web based notebook style editor.
|
||||
* Built-in Apache Spark support
|
||||
## Simple R
|
||||
|
||||
[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/simple-r.png)
|
||||
|
||||
To know more about Zeppelin, visit our web site [http://zeppelin.incubator.apache.org](http://zeppelin.incubator.apache.org)
|
||||
## Plot
|
||||
|
||||
## Requirements
|
||||
* Java 1.7
|
||||
* Tested on Mac OSX, Ubuntu 14.X, CentOS 6.X
|
||||
* Maven (if you want to build from the source code)
|
||||
* Node.js Package Manager
|
||||
[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/plot.png)
|
||||
|
||||
## Getting Started
|
||||
## Scala R Binding
|
||||
|
||||
### Before Build
|
||||
If you don't have requirements prepared, install it.
|
||||
(The installation method may vary according to your environment, example is for Ubuntu.)
|
||||
[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/scala-r.png)
|
||||
|
||||
## R Scala Binding
|
||||
|
||||
[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/r-scala.png)
|
||||
|
||||
## SparkR
|
||||
|
||||
[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/sparkr.png)
|
||||
|
||||
# Prerequisite
|
||||
|
||||
You need R available on the host running the notebook.
|
||||
|
||||
+ For Centos: `yum install R R-devel`
|
||||
+ For Ubuntu: `apt-get install r-base r-cran-rserve`
|
||||
|
||||
Install additional R packages:
|
||||
|
||||
```
|
||||
sudo apt-get update
|
||||
sudo apt-get install git
|
||||
sudo apt-get install openjdk-7-jdk
|
||||
sudo apt-get install npm
|
||||
sudo apt-get install libfontconfig
|
||||
|
||||
# install maven
|
||||
wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
|
||||
sudo tar -zxf apache-maven-3.3.3-bin.tar.gz -C /usr/local/
|
||||
sudo ln -s /usr/local/apache-maven-3.3.3/bin/mvn /usr/local/bin/mvn
|
||||
curl https://cran.r-project.org/src/contrib/Archive/rscala/rscala_1.0.6.tar.gz -o /tmp/rscala_1.0.6.tar.gz
|
||||
R CMD INSTALL /tmp/rscala_1.0.6.tar.gz
|
||||
R -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')"
|
||||
R -e install.packages('knitr', repos = 'http://cran.us.r-project.org')
|
||||
```
|
||||
|
||||
_Notes:_
|
||||
- Ensure node is installed by running `node --version`
|
||||
- Ensure maven is running version 3.1.x or higher with `mvn -version`
|
||||
- Configure maven to use more memory than usual by ```export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"```
|
||||
You also need a compiled version of Spark 1.5.0. Download [the binary distribution](http://archive.apache.org/dist/spark/spark-1.5.0/spark-1.5.0-bin-hadoop2.6.tgz) and untar to make it accessible in `/opt/spark` folder.
|
||||
|
||||
### Build
|
||||
If you want to build Zeppelin from the source, please first clone this repository, then:
|
||||
# Build and Run
|
||||
|
||||
```
|
||||
mvn clean package -DskipTests [Options]
|
||||
mvn clean install -Pspark-1.5 -Dspark.version=1.5.0 \
|
||||
-Dhadoop.version=2.7.1 -Phadoop-2.6 -Ppyspark \
|
||||
-Dmaven.findbugs.enable=false -Drat.skip=true -Dcheckstyle.skip=true \
|
||||
-DskipTests \
|
||||
-pl '!flink,!ignite,!phoenix,!postgresql,!tajo,!hive,!cassandra,!lens,!kylin'
|
||||
```
|
||||
|
||||
Each Interpreter requires different Options.
|
||||
|
||||
|
||||
#### Spark Interpreter
|
||||
|
||||
To build with a specific Spark version, Hadoop version or specific features, define one or more of the following profiles and options:
|
||||
|
||||
##### -Pspark-[version]
|
||||
|
||||
Set spark major version
|
||||
|
||||
Available profiles are
|
||||
|
||||
```
|
||||
-Pspark-1.6
|
||||
-Pspark-1.5
|
||||
-Pspark-1.4
|
||||
-Pspark-1.3
|
||||
-Pspark-1.2
|
||||
-Pspark-1.1
|
||||
-Pcassandra-spark-1.5
|
||||
-Pcassandra-spark-1.4
|
||||
-Pcassandra-spark-1.3
|
||||
-Pcassandra-spark-1.2
|
||||
-Pcassandra-spark-1.1
|
||||
SPARK_HOME=/opt/spark ./bin/zeppelin.sh
|
||||
```
|
||||
|
||||
minor version can be adjusted by `-Dspark.version=x.x.x`
|
||||
Go to [http://localhost:8080](http://localhost:8080) and test the `R Tutorial` note.
|
||||
|
||||
## Get the image from the Docker Repository
|
||||
|
||||
##### -Phadoop-[version]
|
||||
For your convenience, [Datalayer](http://datalayer.io) provides an up-to-date Docker image for [Apache Zeppelin](http://zeppelin.incubator.apache.org), the WEB Notebook for Big Data Science.
|
||||
|
||||
set hadoop major version
|
||||
In order to get the image, you can run with the appropriate rights:
|
||||
|
||||
Available profiles are
|
||||
`docker pull datalayer/zeppelin-rscala`
|
||||
|
||||
```
|
||||
-Phadoop-0.23
|
||||
-Phadoop-1
|
||||
-Phadoop-2.2
|
||||
-Phadoop-2.3
|
||||
-Phadoop-2.4
|
||||
-Phadoop-2.6
|
||||
```
|
||||
Run the Zeppelin notebook with:
|
||||
|
||||
minor version can be adjusted by `-Dhadoop.version=x.x.x`
|
||||
`docker run -it -p 2222:22 -p 8080:8080 -p 4040:4040 datalayer/zeppelin-rscala`
|
||||
|
||||
##### -Pyarn (optional)
|
||||
and go to [http://localhost:8080](http://localhost:8080) to test the `R Tutorial` note.
|
||||
|
||||
enable YARN support for local mode
|
||||
# License
|
||||
|
||||
Copyright 2015 Datalayer http://datalayer.io
|
||||
|
||||
##### -Ppyspark (optional)
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
enable PySpark support for local mode
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
|
||||
##### -Pvendor-repo (optional)
|
||||
[](http://cran.r-project.org)
|
||||
|
||||
enable 3rd party vendor repository (cloudera)
|
||||
[](http://zeppelin.incubator.apache.org)
|
||||
|
||||
|
||||
##### -Pmapr[version] (optional)
|
||||
|
||||
For the MapR Hadoop Distribution, these profiles will handle the Hadoop version. As MapR allows different versions
|
||||
of Spark to be installed, you should specify which version of Spark is installed on the cluster by adding a Spark profile (-Pspark-1.2, -Pspark-1.3, etc.) as needed. For Hive, check the hive/pom.xml and adjust the version installed as well. The correct Maven
|
||||
artifacts can be found for every version of MapR at http://doc.mapr.com
|
||||
|
||||
Available profiles are
|
||||
|
||||
```
|
||||
-Pmapr3
|
||||
-Pmapr40
|
||||
-Pmapr41
|
||||
-Pmapr50
|
||||
```
|
||||
|
||||
|
||||
Here're some examples:
|
||||
|
||||
```
|
||||
# basic build
|
||||
mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark
|
||||
|
||||
# spark-cassandra integration
|
||||
mvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
|
||||
|
||||
# with CDH
|
||||
mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests
|
||||
|
||||
# with MapR
|
||||
mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests
|
||||
```
|
||||
|
||||
|
||||
#### Ignite Interpreter
|
||||
|
||||
```
|
||||
mvn clean package -Dignite.version=1.1.0-incubating -DskipTests
|
||||
```
|
||||
|
||||
#### Scalding Interpreter
|
||||
|
||||
```
|
||||
mvn clean package -Pscalding -DskipTests
|
||||
```
|
||||
|
||||
### Configure
|
||||
If you wish to configure Zeppelin option (like port number), configure the following files:
|
||||
|
||||
```
|
||||
./conf/zeppelin-env.sh
|
||||
./conf/zeppelin-site.xml
|
||||
```
|
||||
(You can copy ```./conf/zeppelin-env.sh.template``` into ```./conf/zeppelin-env.sh```.
|
||||
Same for ```zeppelin-site.xml```.)
|
||||
|
||||
|
||||
#### Setting SPARK_HOME and HADOOP_HOME
|
||||
|
||||
Without SPARK_HOME and HADOOP_HOME, Zeppelin uses embedded Spark and Hadoop binaries that you have specified with mvn build option.
|
||||
If you want to use system provided Spark and Hadoop, export SPARK_HOME and HADOOP_HOME in zeppelin-env.sh
|
||||
You can use any supported version of spark without rebuilding Zeppelin.
|
||||
|
||||
```
|
||||
# ./conf/zeppelin-env.sh
|
||||
export SPARK_HOME=...
|
||||
export HADOOP_HOME=...
|
||||
```
|
||||
|
||||
#### External cluster configuration
|
||||
Mesos
|
||||
|
||||
# ./conf/zeppelin-env.sh
|
||||
export MASTER=mesos://...
|
||||
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.uri=/path/to/spark-*.tgz" or SPARK_HOME="/path/to/spark_home"
|
||||
export MESOS_NATIVE_LIBRARY=/path/to/libmesos.so
|
||||
|
||||
If you set `SPARK_HOME`, you should deploy spark binary on the same location to all worker nodes. And if you set `spark.executor.uri`, every worker can read that file on its node.
|
||||
|
||||
Yarn
|
||||
|
||||
# ./conf/zeppelin-env.sh
|
||||
export SPARK_HOME=/path/to/spark_dir
|
||||
|
||||
### Run
|
||||
./bin/zeppelin-daemon.sh start
|
||||
|
||||
browse localhost:8080 in your browser.
|
||||
|
||||
|
||||
For configuration details check __./conf__ subdirectory.
|
||||
|
||||
### Package
|
||||
To package the final distribution including the compressed archive, run:
|
||||
|
||||
mvn clean package -Pbuild-distr
|
||||
|
||||
To build a distribution with specific profiles, run:
|
||||
|
||||
mvn clean package -Pbuild-distr -Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark
|
||||
|
||||
The profiles `-Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark` can be adjusted if you wish to build to a specific spark versions, or omit support such as `yarn`.
|
||||
|
||||
The archive is generated under _zeppelin-distribution/target_ directory
|
||||
|
||||
###Run end-to-end tests
|
||||
Zeppelin comes with a set of end-to-end acceptance tests driving headless selenium browser
|
||||
|
||||
#assumes zeppelin-server running on localhost:8080 (use -Durl=.. to override)
|
||||
mvn verify
|
||||
|
||||
#or take care of starting\stoping zeppelin-server from packaged _zeppelin-distribuion/target_
|
||||
mvn verify -P using-packaged-distr
|
||||
|
||||
|
||||
|
||||
[](https://github.com/igrigorik/ga-beacon)
|
||||
[](http://datalayer.io)
|
||||
|
|
|
|||
BIN
_Rimg/plot.png
Normal file
BIN
_Rimg/plot.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 59 KiB |
BIN
_Rimg/r-scala.png
Normal file
BIN
_Rimg/r-scala.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 49 KiB |
BIN
_Rimg/scala-r.png
Normal file
BIN
_Rimg/scala-r.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 136 KiB |
BIN
_Rimg/simple-r.png
Normal file
BIN
_Rimg/simple-r.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 42 KiB |
BIN
_Rimg/sparkr.png
Normal file
BIN
_Rimg/sparkr.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 74 KiB |
|
|
@ -129,6 +129,7 @@ if [[ "${INTERPRETER_ID}" == "spark" ]]; then
|
|||
|
||||
export SPARK_CLASSPATH+=":${ZEPPELIN_CLASSPATH}"
|
||||
fi
|
||||
CLASSPATH+=":${ZEPPELIN_CLASSPATH}":"${ZEPPELIN_HOME}/interpreter/spark/r/rscala_2.10-1.0.6.jar"
|
||||
fi
|
||||
|
||||
addJarInDir "${LOCAL_INTERPRETER_REPO}"
|
||||
|
|
|
|||
|
|
@ -138,7 +138,7 @@
|
|||
|
||||
<property>
|
||||
<name>zeppelin.interpreters</name>
|
||||
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.jdbc.JDBCInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.ElasticsearchInterpreter,org.apache.zeppelin.scalding.ScaldingInterpreter,org.apache.zeppelin.tachyon.TachyonInterpreter,org.apache.zeppelin.hbase.HbaseInterpreter</value>
|
||||
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkRInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter</value>
|
||||
<description>Comma separated interpreter configurations. First interpreter become a default</description>
|
||||
</property>
|
||||
|
||||
|
|
|
|||
555
notebook/r/note.json
Normal file
555
notebook/r/note.json
Normal file
File diff suppressed because one or more lines are too long
|
|
@ -782,7 +782,9 @@
|
|||
<target>
|
||||
<delete dir="../interpreter/spark/pyspark"/>
|
||||
<copy todir="../interpreter/spark/pyspark"
|
||||
file="${project.build.directory}/spark-dist/spark-${spark.version}/python/lib/py4j-${py4j.version}-src.zip"/>
|
||||
file="${project.build.directory}/spark-dist/spark-${spark.version}/python/lib/py4j-${py4j.version}-src.zip"/>
|
||||
<copy todir="../interpreter/spark/r"
|
||||
file="../spark/lib/rscala_2.10-1.0.6.jar"/>
|
||||
<zip destfile="${project.build.directory}/../../interpreter/spark/pyspark/pyspark.zip"
|
||||
basedir="${project.build.directory}/spark-dist/spark-${spark.version}/python"
|
||||
includes="pyspark/*.py,pyspark/**/*.py"/>
|
||||
|
|
|
|||
BIN
spark/lib/rscala_2.10-1.0.6.jar
Normal file
BIN
spark/lib/rscala_2.10-1.0.6.jar
Normal file
Binary file not shown.
|
|
@ -35,6 +35,7 @@
|
|||
<url>http://zeppelin.incubator.apache.org</url>
|
||||
|
||||
<properties>
|
||||
<rscala.version>1.0.6</rscala.version>
|
||||
<spark.version>1.4.1</spark.version>
|
||||
<scala.version>2.10.4</scala.version>
|
||||
<scala.binary.version>2.10</scala.binary.version>
|
||||
|
|
@ -231,6 +232,13 @@
|
|||
<scope>provided</scope>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>org.ddahl</groupId>
|
||||
<artifactId>rscala</artifactId>
|
||||
<version>${rscala.version}</version>
|
||||
<scope>system</scope>
|
||||
<systemPath>${basedir}/lib/rscala_2.10-${rscala.version}.jar</systemPath>
|
||||
</dependency>
|
||||
|
||||
<!--TEST-->
|
||||
<dependency>
|
||||
|
|
|
|||
|
|
@ -0,0 +1,130 @@
|
|||
/*
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.zeppelin.spark;
|
||||
|
||||
import org.apache.zeppelin.interpreter.*;
|
||||
import org.apache.zeppelin.scheduler.Scheduler;
|
||||
import org.apache.zeppelin.scheduler.SchedulerFactory;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
import java.util.Properties;
|
||||
import java.io.BufferedWriter;
|
||||
|
||||
/**
|
||||
* R and SparkR interpreter.
|
||||
*/
|
||||
public class SparkRInterpreter extends Interpreter {
|
||||
private static final Logger logger = LoggerFactory.getLogger(SparkRInterpreter.class);
|
||||
|
||||
static {
|
||||
Interpreter.register(
|
||||
"r",
|
||||
"spark",
|
||||
SparkRInterpreter.class.getName(),
|
||||
new InterpreterPropertyBuilder()
|
||||
.add("spark.master",
|
||||
SparkInterpreter.getSystemDefault("MASTER", "spark.master", "local[*]"),
|
||||
"Spark master uri. ex) spark://masterhost:7077")
|
||||
.add("spark.home",
|
||||
SparkInterpreter.getSystemDefault("SPARK_HOME", "spark.home", "/opt/spark"),
|
||||
"Spark distribution location")
|
||||
.build());
|
||||
}
|
||||
|
||||
public SparkRInterpreter(Properties property) {
|
||||
super(property);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void open() {
|
||||
ZeppelinR.open(getProperty("spark.master"), getProperty("spark.home"), getSparkInterpreter());
|
||||
}
|
||||
|
||||
@Override
|
||||
public InterpreterResult interpret(String lines, InterpreterContext contextInterpreter) {
|
||||
|
||||
try {
|
||||
|
||||
ZeppelinR.set(".zcmd", "\n```{r comment=NA, echo=FALSE}\n" + lines + "\n```");
|
||||
ZeppelinR.eval(".zres <- knit2html(text=.zcmd)");
|
||||
String html = ZeppelinR.getS0(".zres");
|
||||
|
||||
// Only keep the bare results.
|
||||
String htmlOut = html.substring(html.indexOf("<body>") + 7, html.indexOf("</body>") - 1)
|
||||
.replaceAll("<code>", "").replaceAll("</code>", "")
|
||||
.replaceAll("\n\n", "")
|
||||
.replaceAll("\n", "<br>")
|
||||
.replaceAll("<pre>", "<p class='text'>").replaceAll("</pre>", "</p>");
|
||||
|
||||
return new InterpreterResult(InterpreterResult.Code.SUCCESS, "%html\n" + htmlOut);
|
||||
} catch (Exception e) {
|
||||
logger.error("Exception while connecting to R", e);
|
||||
return new InterpreterResult(InterpreterResult.Code.ERROR, e.getMessage());
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
@Override
|
||||
public void close() {
|
||||
ZeppelinR.close();
|
||||
}
|
||||
|
||||
@Override
|
||||
public void cancel(InterpreterContext context) {}
|
||||
|
||||
@Override
|
||||
public FormType getFormType() {
|
||||
return FormType.NONE;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getProgress(InterpreterContext context) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Scheduler getScheduler() {
|
||||
return SchedulerFactory.singleton().createOrGetFIFOScheduler(
|
||||
SparkRInterpreter.class.getName() + this.hashCode());
|
||||
}
|
||||
|
||||
@Override
|
||||
public List<String> completion(String buf, int cursor) {
|
||||
return new ArrayList<String>();
|
||||
}
|
||||
|
||||
private SparkInterpreter getSparkInterpreter() {
|
||||
for (Interpreter intp : getInterpreterGroup()) {
|
||||
if (intp.getClassName().equals(SparkInterpreter.class.getName())) {
|
||||
Interpreter p = intp;
|
||||
while (p instanceof WrappedInterpreter) {
|
||||
if (p instanceof LazyOpenInterpreter) {
|
||||
p.open();
|
||||
}
|
||||
p = ((WrappedInterpreter) p).getInnerInterpreter();
|
||||
}
|
||||
return (SparkInterpreter) p;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -0,0 +1,55 @@
|
|||
/*
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.zeppelin.spark;
|
||||
|
||||
import org.apache.spark.SparkContext;
|
||||
import org.apache.spark.sql.SQLContext;
|
||||
|
||||
/**
|
||||
* Contains the Spark and Zeppelin Contexts made available to SparkR.
|
||||
*/
|
||||
public class ZeppelinRContext {
|
||||
private static SparkContext sparkContext;
|
||||
private static SQLContext sqlContext;
|
||||
private static ZeppelinContext zeppelinContext;
|
||||
|
||||
public static void setSparkContext(SparkContext sparkContext) {
|
||||
ZeppelinRContext.sparkContext = sparkContext;
|
||||
}
|
||||
|
||||
public static void setZepplinContext(ZeppelinContext zeppelinContext) {
|
||||
ZeppelinRContext.zeppelinContext = zeppelinContext;
|
||||
}
|
||||
|
||||
public static void setSqlContext(SQLContext sqlContext) {
|
||||
ZeppelinRContext.sqlContext = sqlContext;
|
||||
}
|
||||
|
||||
public static SparkContext getSparkContext() {
|
||||
return sparkContext;
|
||||
}
|
||||
|
||||
public static SQLContext getSqlContext() {
|
||||
return sqlContext;
|
||||
}
|
||||
|
||||
public static ZeppelinContext getZeppelinContext() {
|
||||
return zeppelinContext;
|
||||
}
|
||||
|
||||
}
|
||||
36
spark/src/main/scala/org/apache/spark/SparkRBackend.scala
Normal file
36
spark/src/main/scala/org/apache/spark/SparkRBackend.scala
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
/*
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.spark
|
||||
|
||||
import org.apache.spark.api.r.RBackend
|
||||
|
||||
object SparkRBackend {
|
||||
val backend : RBackend = new RBackend()
|
||||
|
||||
val backendThread : Thread = new Thread("SparkRBackend") {
|
||||
override def run() {
|
||||
backend.run()
|
||||
}
|
||||
}
|
||||
|
||||
def init() : Int = backend.init()
|
||||
|
||||
def start() : Unit = backendThread.start()
|
||||
|
||||
def close() : Unit = backend.close()
|
||||
|
||||
}
|
||||
123
spark/src/main/scala/org/apache/zeppelin/spark/ZeppelinR.scala
Normal file
123
spark/src/main/scala/org/apache/zeppelin/spark/ZeppelinR.scala
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
/*
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.zeppelin.spark
|
||||
|
||||
import org.apache.spark.SparkRBackend
|
||||
import org.ddahl.rscala.callback._
|
||||
|
||||
object ZeppelinR {
|
||||
|
||||
private val R = RClient()
|
||||
|
||||
def open(master: String = "local[*]", sparkHome: String = "/opt/spark", sparkInterpreter: SparkInterpreter): Unit = {
|
||||
|
||||
eval(
|
||||
s"""
|
||||
|Sys.setenv(SPARK_HOME="$sparkHome")
|
||||
|.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
|
||||
|library(SparkR)
|
||||
""".stripMargin
|
||||
)
|
||||
|
||||
// See ./core/src/main/scala/org/apache/spark/deploy/RRunner.scala for RBackend usage
|
||||
val port = SparkRBackend.init()
|
||||
SparkRBackend.start()
|
||||
eval(
|
||||
s"""
|
||||
|SparkR:::connectBackend("localhost", ${port})
|
||||
|""".stripMargin)
|
||||
|
||||
// scStartTime is needed by R/pkg/R/sparkR.R
|
||||
eval(
|
||||
"""
|
||||
|assign(".scStartTime", as.integer(Sys.time()), envir = SparkR:::.sparkREnv)
|
||||
""".stripMargin)
|
||||
|
||||
ZeppelinRContext.setSparkContext(sparkInterpreter.getSparkContext())
|
||||
eval(
|
||||
"""
|
||||
|assign(".sc", SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinRContext", "getSparkContext"), envir = SparkR:::.sparkREnv)
|
||||
|""".stripMargin)
|
||||
eval(
|
||||
"""
|
||||
|assign("sc", get(".sc", envir = SparkR:::.sparkREnv), envir=.GlobalEnv)
|
||||
""".stripMargin)
|
||||
|
||||
ZeppelinRContext.setSqlContext(sparkInterpreter.getSQLContext())
|
||||
eval(
|
||||
"""
|
||||
|assign(".sqlc", SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinRContext", "getSqlContext"), envir = SparkR:::.sparkREnv)
|
||||
|""".stripMargin)
|
||||
eval(
|
||||
"""
|
||||
|assign("sqlContext", get(".sqlc", envir = SparkR:::.sparkREnv), envir = .GlobalEnv)
|
||||
|""".stripMargin)
|
||||
|
||||
ZeppelinRContext.setZepplinContext(sparkInterpreter.getZeppelinContext())
|
||||
eval(
|
||||
"""
|
||||
|assign(".zeppelinContext", SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinRContext", "getZeppelinContext"), envir = .GlobalEnv)
|
||||
|""".stripMargin
|
||||
)
|
||||
|
||||
eval(
|
||||
"""
|
||||
|z.put <- function(name, object) {
|
||||
| SparkR:::callJMethod(.zeppelinContext, "put", name, object)
|
||||
|}
|
||||
|z.get <- function(name) {
|
||||
| SparkR:::callJMethod(.zeppelinContext, "get", name)
|
||||
|}
|
||||
|""".stripMargin
|
||||
)
|
||||
|
||||
eval(
|
||||
"""
|
||||
|library("knitr")
|
||||
""".stripMargin)
|
||||
|
||||
}
|
||||
|
||||
def eval(command: String): Any = {
|
||||
try {
|
||||
R.eval(command)
|
||||
} catch {
|
||||
case e: Exception => throw new RuntimeException(e.getMessage + " - Given R command=" + command)
|
||||
}
|
||||
}
|
||||
|
||||
def set(key: String, value: AnyRef): Unit = {
|
||||
R.set(key, value)
|
||||
}
|
||||
|
||||
def get(key: String): Any = {
|
||||
R.get(key)._1
|
||||
}
|
||||
|
||||
def getS0(key: String): String = {
|
||||
R.getS0(key)
|
||||
}
|
||||
|
||||
def close():Unit = {
|
||||
R.eval("""
|
||||
|sparkR.stop()
|
||||
""".stripMargin
|
||||
)
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -56,6 +56,12 @@
|
|||
<groupId>com.amazonaws</groupId>
|
||||
<artifactId>aws-java-sdk-s3</artifactId>
|
||||
<version>1.10.1</version>
|
||||
<exclusions>
|
||||
<exclusion>
|
||||
<groupId>com.fasterxml.jackson.core</groupId>
|
||||
<artifactId>*</artifactId>
|
||||
</exclusion>
|
||||
</exclusions>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
|
|
|
|||
|
|
@ -439,6 +439,7 @@ public class ZeppelinConfiguration extends XMLConfiguration {
|
|||
ZEPPELIN_WAR_TEMPDIR("zeppelin.war.tempdir", "webapps"),
|
||||
ZEPPELIN_INTERPRETERS("zeppelin.interpreters", "org.apache.zeppelin.spark.SparkInterpreter,"
|
||||
+ "org.apache.zeppelin.spark.PySparkInterpreter,"
|
||||
+ "org.apache.zeppelin.spark.SparkRInterpreter,"
|
||||
+ "org.apache.zeppelin.spark.SparkSqlInterpreter,"
|
||||
+ "org.apache.zeppelin.spark.DepInterpreter,"
|
||||
+ "org.apache.zeppelin.markdown.Markdown,"
|
||||
|
|
|
|||
Loading…
Reference in a new issue