Add SparRInterpreter implementation

This commit is contained in:
Eric Charles 2015-11-20 14:01:47 +01:00
parent 617eb947bd
commit 15375ebccb
18 changed files with 985 additions and 200 deletions

264
README.md
View file

@ -1,234 +1,102 @@
#Zeppelin
# Apache Zeppelin R
**Documentation:** [User Guide](http://zeppelin.incubator.apache.org/docs/latest/index.html)<br/>
**Mailing Lists:** [User and Dev mailing list](http://zeppelin.incubator.apache.org/community.html)<br/>
**Continuous Integration:** [![Build Status](https://secure.travis-ci.org/apache/incubator-zeppelin.png?branch=master)](https://travis-ci.org/apache/incubator-zeppelin) <br/>
**Contributing:** [Contribution Guide](https://github.com/apache/incubator-zeppelin/blob/master/CONTRIBUTING.md)<br/>
**Issue Tracker:** [Jira](https://issues.apache.org/jira/browse/ZEPPELIN)<br/>
**License:** [Apache 2.0](https://github.com/apache/incubator-zeppelin/blob/master/LICENSE)
This adds [R](http://cran.r-project.org) interpeter to the [Apache Zeppelin notebook](http://zeppelin.incubator.apache.org).
It supports:
**Zeppelin**, a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.
+ R code.
+ SparkR code.
+ Cross paragraph R variables.
+ Scala to R binding (passing basic Scala data structure to R).
+ R to Scala binding (passing basic R data structure to Scala).
+ R plot (ggplot2...).
Core feature:
* Web based notebook style editor.
* Built-in Apache Spark support
## Simple R
[![Simple R](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/simple-r.png)](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/simple-r.png)
To know more about Zeppelin, visit our web site [http://zeppelin.incubator.apache.org](http://zeppelin.incubator.apache.org)
## Plot
## Requirements
* Java 1.7
* Tested on Mac OSX, Ubuntu 14.X, CentOS 6.X
* Maven (if you want to build from the source code)
* Node.js Package Manager
[![Plot](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/plot.png)](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/plot.png)
## Getting Started
## Scala R Binding
### Before Build
If you don't have requirements prepared, install it.
(The installation method may vary according to your environment, example is for Ubuntu.)
[![Scala R Binding](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/scala-r.png)](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/scala-r.png)
## R Scala Binding
[![R Scala Binding](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/r-scala.png)](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/r-scala.png)
## SparkR
[![SparkR](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/sparkr.png)](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/sparkr.png)
# Prerequisite
You need R available on the host running the notebook.
+ For Centos: `yum install R R-devel`
+ For Ubuntu: `apt-get install r-base r-cran-rserve`
Install additional R packages:
```
sudo apt-get update
sudo apt-get install git
sudo apt-get install openjdk-7-jdk
sudo apt-get install npm
sudo apt-get install libfontconfig
# install maven
wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo tar -zxf apache-maven-3.3.3-bin.tar.gz -C /usr/local/
sudo ln -s /usr/local/apache-maven-3.3.3/bin/mvn /usr/local/bin/mvn
curl https://cran.r-project.org/src/contrib/Archive/rscala/rscala_1.0.6.tar.gz -o /tmp/rscala_1.0.6.tar.gz
R CMD INSTALL /tmp/rscala_1.0.6.tar.gz
R -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')"
R -e install.packages('knitr', repos = 'http://cran.us.r-project.org')
```
_Notes:_
- Ensure node is installed by running `node --version`
- Ensure maven is running version 3.1.x or higher with `mvn -version`
- Configure maven to use more memory than usual by ```export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"```
You also need a compiled version of Spark 1.5.0. Download [the binary distribution](http://archive.apache.org/dist/spark/spark-1.5.0/spark-1.5.0-bin-hadoop2.6.tgz) and untar to make it accessible in `/opt/spark` folder.
### Build
If you want to build Zeppelin from the source, please first clone this repository, then:
# Build and Run
```
mvn clean package -DskipTests [Options]
mvn clean install -Pspark-1.5 -Dspark.version=1.5.0 \
-Dhadoop.version=2.7.1 -Phadoop-2.6 -Ppyspark \
-Dmaven.findbugs.enable=false -Drat.skip=true -Dcheckstyle.skip=true \
-DskipTests \
-pl '!flink,!ignite,!phoenix,!postgresql,!tajo,!hive,!cassandra,!lens,!kylin'
```
Each Interpreter requires different Options.
#### Spark Interpreter
To build with a specific Spark version, Hadoop version or specific features, define one or more of the following profiles and options:
##### -Pspark-[version]
Set spark major version
Available profiles are
```
-Pspark-1.6
-Pspark-1.5
-Pspark-1.4
-Pspark-1.3
-Pspark-1.2
-Pspark-1.1
-Pcassandra-spark-1.5
-Pcassandra-spark-1.4
-Pcassandra-spark-1.3
-Pcassandra-spark-1.2
-Pcassandra-spark-1.1
SPARK_HOME=/opt/spark ./bin/zeppelin.sh
```
minor version can be adjusted by `-Dspark.version=x.x.x`
Go to [http://localhost:8080](http://localhost:8080) and test the `R Tutorial` note.
## Get the image from the Docker Repository
##### -Phadoop-[version]
For your convenience, [Datalayer](http://datalayer.io) provides an up-to-date Docker image for [Apache Zeppelin](http://zeppelin.incubator.apache.org), the WEB Notebook for Big Data Science.
set hadoop major version
In order to get the image, you can run with the appropriate rights:
Available profiles are
`docker pull datalayer/zeppelin-rscala`
```
-Phadoop-0.23
-Phadoop-1
-Phadoop-2.2
-Phadoop-2.3
-Phadoop-2.4
-Phadoop-2.6
```
Run the Zeppelin notebook with:
minor version can be adjusted by `-Dhadoop.version=x.x.x`
`docker run -it -p 2222:22 -p 8080:8080 -p 4040:4040 datalayer/zeppelin-rscala`
##### -Pyarn (optional)
and go to [http://localhost:8080](http://localhost:8080) to test the `R Tutorial` note.
enable YARN support for local mode
# License
Copyright 2015 Datalayer http://datalayer.io
##### -Ppyspark (optional)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
enable PySpark support for local mode
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
##### -Pvendor-repo (optional)
[![R](http://datalayer.io/ext/images/logo-R-200.png)](http://cran.r-project.org)
enable 3rd party vendor repository (cloudera)
[![Apache Zeppelin](http://datalayer.io/ext/images/logo-zeppelin-small.png)](http://zeppelin.incubator.apache.org)
##### -Pmapr[version] (optional)
For the MapR Hadoop Distribution, these profiles will handle the Hadoop version. As MapR allows different versions
of Spark to be installed, you should specify which version of Spark is installed on the cluster by adding a Spark profile (-Pspark-1.2, -Pspark-1.3, etc.) as needed. For Hive, check the hive/pom.xml and adjust the version installed as well. The correct Maven
artifacts can be found for every version of MapR at http://doc.mapr.com
Available profiles are
```
-Pmapr3
-Pmapr40
-Pmapr41
-Pmapr50
```
Here're some examples:
```
# basic build
mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark
# spark-cassandra integration
mvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
# with CDH
mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests
# with MapR
mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests
```
#### Ignite Interpreter
```
mvn clean package -Dignite.version=1.1.0-incubating -DskipTests
```
#### Scalding Interpreter
```
mvn clean package -Pscalding -DskipTests
```
### Configure
If you wish to configure Zeppelin option (like port number), configure the following files:
```
./conf/zeppelin-env.sh
./conf/zeppelin-site.xml
```
(You can copy ```./conf/zeppelin-env.sh.template``` into ```./conf/zeppelin-env.sh```.
Same for ```zeppelin-site.xml```.)
#### Setting SPARK_HOME and HADOOP_HOME
Without SPARK_HOME and HADOOP_HOME, Zeppelin uses embedded Spark and Hadoop binaries that you have specified with mvn build option.
If you want to use system provided Spark and Hadoop, export SPARK_HOME and HADOOP_HOME in zeppelin-env.sh
You can use any supported version of spark without rebuilding Zeppelin.
```
# ./conf/zeppelin-env.sh
export SPARK_HOME=...
export HADOOP_HOME=...
```
#### External cluster configuration
Mesos
# ./conf/zeppelin-env.sh
export MASTER=mesos://...
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.uri=/path/to/spark-*.tgz" or SPARK_HOME="/path/to/spark_home"
export MESOS_NATIVE_LIBRARY=/path/to/libmesos.so
If you set `SPARK_HOME`, you should deploy spark binary on the same location to all worker nodes. And if you set `spark.executor.uri`, every worker can read that file on its node.
Yarn
# ./conf/zeppelin-env.sh
export SPARK_HOME=/path/to/spark_dir
### Run
./bin/zeppelin-daemon.sh start
browse localhost:8080 in your browser.
For configuration details check __./conf__ subdirectory.
### Package
To package the final distribution including the compressed archive, run:
mvn clean package -Pbuild-distr
To build a distribution with specific profiles, run:
mvn clean package -Pbuild-distr -Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark
The profiles `-Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark` can be adjusted if you wish to build to a specific spark versions, or omit support such as `yarn`.
The archive is generated under _zeppelin-distribution/target_ directory
###Run end-to-end tests
Zeppelin comes with a set of end-to-end acceptance tests driving headless selenium browser
#assumes zeppelin-server running on localhost:8080 (use -Durl=.. to override)
mvn verify
#or take care of starting\stoping zeppelin-server from packaged _zeppelin-distribuion/target_
mvn verify -P using-packaged-distr
[![Analytics](https://ga-beacon.appspot.com/UA-45176241-4/apache/incubator-zeppelin/README.md?pixel)](https://github.com/igrigorik/ga-beacon)
[![Datalayer](http://datalayer.io/ext/images/logo_horizontal_072ppi.png)](http://datalayer.io)

BIN
_Rimg/plot.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

BIN
_Rimg/r-scala.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

BIN
_Rimg/scala-r.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

BIN
_Rimg/simple-r.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

BIN
_Rimg/sparkr.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

View file

@ -129,6 +129,7 @@ if [[ "${INTERPRETER_ID}" == "spark" ]]; then
export SPARK_CLASSPATH+=":${ZEPPELIN_CLASSPATH}"
fi
CLASSPATH+=":${ZEPPELIN_CLASSPATH}":"${ZEPPELIN_HOME}/interpreter/spark/r/rscala_2.10-1.0.6.jar"
fi
addJarInDir "${LOCAL_INTERPRETER_REPO}"

View file

@ -138,7 +138,7 @@
<property>
<name>zeppelin.interpreters</name>
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.jdbc.JDBCInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.ElasticsearchInterpreter,org.apache.zeppelin.scalding.ScaldingInterpreter,org.apache.zeppelin.tachyon.TachyonInterpreter,org.apache.zeppelin.hbase.HbaseInterpreter</value>
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkRInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter</value>
<description>Comma separated interpreter configurations. First interpreter become a default</description>
</property>

555
notebook/r/note.json Normal file

File diff suppressed because one or more lines are too long

View file

@ -782,7 +782,9 @@
<target>
<delete dir="../interpreter/spark/pyspark"/>
<copy todir="../interpreter/spark/pyspark"
file="${project.build.directory}/spark-dist/spark-${spark.version}/python/lib/py4j-${py4j.version}-src.zip"/>
file="${project.build.directory}/spark-dist/spark-${spark.version}/python/lib/py4j-${py4j.version}-src.zip"/>
<copy todir="../interpreter/spark/r"
file="../spark/lib/rscala_2.10-1.0.6.jar"/>
<zip destfile="${project.build.directory}/../../interpreter/spark/pyspark/pyspark.zip"
basedir="${project.build.directory}/spark-dist/spark-${spark.version}/python"
includes="pyspark/*.py,pyspark/**/*.py"/>

Binary file not shown.

View file

@ -35,6 +35,7 @@
<url>http://zeppelin.incubator.apache.org</url>
<properties>
<rscala.version>1.0.6</rscala.version>
<spark.version>1.4.1</spark.version>
<scala.version>2.10.4</scala.version>
<scala.binary.version>2.10</scala.binary.version>
@ -231,6 +232,13 @@
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.ddahl</groupId>
<artifactId>rscala</artifactId>
<version>${rscala.version}</version>
<scope>system</scope>
<systemPath>${basedir}/lib/rscala_2.10-${rscala.version}.jar</systemPath>
</dependency>
<!--TEST-->
<dependency>

View file

@ -0,0 +1,130 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.zeppelin.spark;
import org.apache.zeppelin.interpreter.*;
import org.apache.zeppelin.scheduler.Scheduler;
import org.apache.zeppelin.scheduler.SchedulerFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import java.io.BufferedWriter;
/**
* R and SparkR interpreter.
*/
public class SparkRInterpreter extends Interpreter {
private static final Logger logger = LoggerFactory.getLogger(SparkRInterpreter.class);
static {
Interpreter.register(
"r",
"spark",
SparkRInterpreter.class.getName(),
new InterpreterPropertyBuilder()
.add("spark.master",
SparkInterpreter.getSystemDefault("MASTER", "spark.master", "local[*]"),
"Spark master uri. ex) spark://masterhost:7077")
.add("spark.home",
SparkInterpreter.getSystemDefault("SPARK_HOME", "spark.home", "/opt/spark"),
"Spark distribution location")
.build());
}
public SparkRInterpreter(Properties property) {
super(property);
}
@Override
public void open() {
ZeppelinR.open(getProperty("spark.master"), getProperty("spark.home"), getSparkInterpreter());
}
@Override
public InterpreterResult interpret(String lines, InterpreterContext contextInterpreter) {
try {
ZeppelinR.set(".zcmd", "\n```{r comment=NA, echo=FALSE}\n" + lines + "\n```");
ZeppelinR.eval(".zres <- knit2html(text=.zcmd)");
String html = ZeppelinR.getS0(".zres");
// Only keep the bare results.
String htmlOut = html.substring(html.indexOf("<body>") + 7, html.indexOf("</body>") - 1)
.replaceAll("<code>", "").replaceAll("</code>", "")
.replaceAll("\n\n", "")
.replaceAll("\n", "<br>")
.replaceAll("<pre>", "<p class='text'>").replaceAll("</pre>", "</p>");
return new InterpreterResult(InterpreterResult.Code.SUCCESS, "%html\n" + htmlOut);
} catch (Exception e) {
logger.error("Exception while connecting to R", e);
return new InterpreterResult(InterpreterResult.Code.ERROR, e.getMessage());
}
}
@Override
public void close() {
ZeppelinR.close();
}
@Override
public void cancel(InterpreterContext context) {}
@Override
public FormType getFormType() {
return FormType.NONE;
}
@Override
public int getProgress(InterpreterContext context) {
return 0;
}
@Override
public Scheduler getScheduler() {
return SchedulerFactory.singleton().createOrGetFIFOScheduler(
SparkRInterpreter.class.getName() + this.hashCode());
}
@Override
public List<String> completion(String buf, int cursor) {
return new ArrayList<String>();
}
private SparkInterpreter getSparkInterpreter() {
for (Interpreter intp : getInterpreterGroup()) {
if (intp.getClassName().equals(SparkInterpreter.class.getName())) {
Interpreter p = intp;
while (p instanceof WrappedInterpreter) {
if (p instanceof LazyOpenInterpreter) {
p.open();
}
p = ((WrappedInterpreter) p).getInnerInterpreter();
}
return (SparkInterpreter) p;
}
}
return null;
}
}

View file

@ -0,0 +1,55 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.zeppelin.spark;
import org.apache.spark.SparkContext;
import org.apache.spark.sql.SQLContext;
/**
* Contains the Spark and Zeppelin Contexts made available to SparkR.
*/
public class ZeppelinRContext {
private static SparkContext sparkContext;
private static SQLContext sqlContext;
private static ZeppelinContext zeppelinContext;
public static void setSparkContext(SparkContext sparkContext) {
ZeppelinRContext.sparkContext = sparkContext;
}
public static void setZepplinContext(ZeppelinContext zeppelinContext) {
ZeppelinRContext.zeppelinContext = zeppelinContext;
}
public static void setSqlContext(SQLContext sqlContext) {
ZeppelinRContext.sqlContext = sqlContext;
}
public static SparkContext getSparkContext() {
return sparkContext;
}
public static SQLContext getSqlContext() {
return sqlContext;
}
public static ZeppelinContext getZeppelinContext() {
return zeppelinContext;
}
}

View file

@ -0,0 +1,36 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark
import org.apache.spark.api.r.RBackend
object SparkRBackend {
val backend : RBackend = new RBackend()
val backendThread : Thread = new Thread("SparkRBackend") {
override def run() {
backend.run()
}
}
def init() : Int = backend.init()
def start() : Unit = backendThread.start()
def close() : Unit = backend.close()
}

View file

@ -0,0 +1,123 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.zeppelin.spark
import org.apache.spark.SparkRBackend
import org.ddahl.rscala.callback._
object ZeppelinR {
private val R = RClient()
def open(master: String = "local[*]", sparkHome: String = "/opt/spark", sparkInterpreter: SparkInterpreter): Unit = {
eval(
s"""
|Sys.setenv(SPARK_HOME="$sparkHome")
|.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
|library(SparkR)
""".stripMargin
)
// See ./core/src/main/scala/org/apache/spark/deploy/RRunner.scala for RBackend usage
val port = SparkRBackend.init()
SparkRBackend.start()
eval(
s"""
|SparkR:::connectBackend("localhost", ${port})
|""".stripMargin)
// scStartTime is needed by R/pkg/R/sparkR.R
eval(
"""
|assign(".scStartTime", as.integer(Sys.time()), envir = SparkR:::.sparkREnv)
""".stripMargin)
ZeppelinRContext.setSparkContext(sparkInterpreter.getSparkContext())
eval(
"""
|assign(".sc", SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinRContext", "getSparkContext"), envir = SparkR:::.sparkREnv)
|""".stripMargin)
eval(
"""
|assign("sc", get(".sc", envir = SparkR:::.sparkREnv), envir=.GlobalEnv)
""".stripMargin)
ZeppelinRContext.setSqlContext(sparkInterpreter.getSQLContext())
eval(
"""
|assign(".sqlc", SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinRContext", "getSqlContext"), envir = SparkR:::.sparkREnv)
|""".stripMargin)
eval(
"""
|assign("sqlContext", get(".sqlc", envir = SparkR:::.sparkREnv), envir = .GlobalEnv)
|""".stripMargin)
ZeppelinRContext.setZepplinContext(sparkInterpreter.getZeppelinContext())
eval(
"""
|assign(".zeppelinContext", SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinRContext", "getZeppelinContext"), envir = .GlobalEnv)
|""".stripMargin
)
eval(
"""
|z.put <- function(name, object) {
| SparkR:::callJMethod(.zeppelinContext, "put", name, object)
|}
|z.get <- function(name) {
| SparkR:::callJMethod(.zeppelinContext, "get", name)
|}
|""".stripMargin
)
eval(
"""
|library("knitr")
""".stripMargin)
}
def eval(command: String): Any = {
try {
R.eval(command)
} catch {
case e: Exception => throw new RuntimeException(e.getMessage + " - Given R command=" + command)
}
}
def set(key: String, value: AnyRef): Unit = {
R.set(key, value)
}
def get(key: String): Any = {
R.get(key)._1
}
def getS0(key: String): String = {
R.getS0(key)
}
def close():Unit = {
R.eval("""
|sparkR.stop()
""".stripMargin
)
}
}

View file

@ -56,6 +56,12 @@
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.10.1</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>

View file

@ -439,6 +439,7 @@ public class ZeppelinConfiguration extends XMLConfiguration {
ZEPPELIN_WAR_TEMPDIR("zeppelin.war.tempdir", "webapps"),
ZEPPELIN_INTERPRETERS("zeppelin.interpreters", "org.apache.zeppelin.spark.SparkInterpreter,"
+ "org.apache.zeppelin.spark.PySparkInterpreter,"
+ "org.apache.zeppelin.spark.SparkRInterpreter,"
+ "org.apache.zeppelin.spark.SparkSqlInterpreter,"
+ "org.apache.zeppelin.spark.DepInterpreter,"
+ "org.apache.zeppelin.markdown.Markdown,"