mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
Remove duplicated info in r.md & apply toc
This commit is contained in:
parent
a03ca99f03
commit
3ffb3835cd
1 changed files with 80 additions and 96 deletions
|
|
@ -6,104 +6,11 @@ group: manual
|
|||
---
|
||||
{% include JB/setup %}
|
||||
|
||||
## R Interpreter
|
||||
# R Interpreter for Apache Zeppelin
|
||||
|
||||
This is a the Apache Zeppelin project, with the addition of support for the R programming language and R-spark integration.
|
||||
<div id="toc"></div>
|
||||
|
||||
### Requirements
|
||||
|
||||
Additional requirements for the R interpreter are:
|
||||
|
||||
* R 3.1 or later (earlier versions may work, but have not been tested)
|
||||
* The `evaluate` R package.
|
||||
|
||||
For full R support, you will also need the following R packages:
|
||||
|
||||
* `knitr`
|
||||
* `repr` -- available with `devtools::install_github("IRkernel/repr")`
|
||||
* `htmltools` -- required for some interactive plotting
|
||||
* `base64enc` -- required to view R base plots
|
||||
|
||||
### Configuration
|
||||
|
||||
To run Zeppelin with the R Interpreter, the SPARK_HOME environment variable must be set. The best way to do this is by editing `conf/zeppelin-env.sh`.
|
||||
|
||||
If it is not set, the R Interpreter will not be able to interface with Spark.
|
||||
|
||||
You should also copy `conf/zeppelin-site.xml.template` to `conf/zeppelin-site.xml`. That will ensure that Zeppelin sees the R Interpreter the first time it starts up.
|
||||
|
||||
### Using the R Interpreter
|
||||
|
||||
By default, the R Interpreter appears as two Zeppelin Interpreters, `%r` and `%knitr`.
|
||||
|
||||
`%r` will behave like an ordinary REPL. You can execute commands as in the CLI.
|
||||
|
||||
[](screenshots/repl2plus2.png)
|
||||
|
||||
R base plotting is fully supported
|
||||
|
||||
[](screenshots/replhist.png)
|
||||
|
||||
If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations.
|
||||
|
||||
[](screenshots/replhead.png)
|
||||
|
||||
`%knitr` interfaces directly against `knitr`, with chunk options on the first line:
|
||||
|
||||
[](screenshots/knitgeo.png)
|
||||
[](screenshots/knitstock.png)
|
||||
[](screenshots/knitmotion.png)
|
||||
|
||||
The two interpreters share the same environment. If you define a variable from `%r`, it will be within-scope if you then make a call using `knitr`.
|
||||
|
||||
### Using SparkR & Moving Between Languages
|
||||
|
||||
If `SPARK_HOME` is set, the `SparkR` package will be loaded automatically:
|
||||
|
||||
[](screenshots/sparkrfaithful.png)
|
||||
|
||||
The Spark Context and SQL Context are created and injected into the local environment automatically as `sc` and `sql`.
|
||||
|
||||
The same context are shared with the `%spark`, `%sql` and `%pyspark` interpreters:
|
||||
|
||||
[](screenshots/backtoscala.png)
|
||||
|
||||
You can also make an ordinary R variable accessible in scala and Python:
|
||||
|
||||
[](screenshots/varr1.png)
|
||||
|
||||
And vice versa:
|
||||
|
||||
[](screenshots/varscala.png)
|
||||
[](screenshots/varr2.png)
|
||||
|
||||
### Caveats & Troubleshooting
|
||||
|
||||
* Almost all issues with the R interpreter turned out to be caused by an incorrectly set `SPARK_HOME`. The R interpreter must load a version of the `SparkR` package that matches the running version of Spark, and it does this by searching `SPARK_HOME`. If Zeppelin isn't configured to interface with Spark in `SPARK_HOME`, the R interpreter will not be able to connect to Spark.
|
||||
|
||||
* The `knitr` environment is persistent. If you run a chunk from Zeppelin that changes a variable, then run the same chunk again, the variable has already been changed. Use immutable variables.
|
||||
|
||||
* (Note that `%spark.r` and `$r` are two different ways of calling the same interpreter, as are `%spark.knitr` and `%knitr`. By default, Zeppelin puts the R interpreters in the `%spark.` Interpreter Group.
|
||||
|
||||
* Using the `%r` interpreter, if you return a data.frame, HTML, or an image, it will dominate the result. So if you execute three commands, and one is `hist()`, all you will see is the histogram, not the results of the other commands. This is a Zeppelin limitation.
|
||||
|
||||
* If you return a data.frame (for instance, from calling `head()`) from the `%spark.r` interpreter, it will be parsed by Zeppelin's built-in data visualization system.
|
||||
|
||||
* Why `knitr` Instead of `rmarkdown`? Why no `htmlwidgets`? In order to support `htmlwidgets`, which has indirect dependencies, `rmarkdown` uses `pandoc`, which requires writing to and reading from disc. This makes it many times slower than `knitr`, which can operate entirely in RAM.
|
||||
|
||||
* Why no `ggvis` or `shiny`? Supporting `shiny` would require integrating a reverse-proxy into Zeppelin, which is a task.
|
||||
|
||||
* Max OS X & case-insensitive filesystem. If you try to install on a case-insensitive filesystem, which is the Mac OS X default, maven can unintentionally delete the install directory because `r` and `R` become the same subdirectory.
|
||||
|
||||
* Error `unable to start device X11` with the repl interpreter. Check your shell login scripts to see if they are adjusting the `DISPLAY` environment variable. This is common on some operating systems as a workaround for ssh issues, but can interfere with R plotting.
|
||||
|
||||
* akka Library Version or `TTransport` errors. This can happen if you try to run Zeppelin with a SPARK_HOME that has a version of Spark other than the one specified with `-Pspark-1.x` when Zeppelin was compiled.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## R Interpreter for Apache Zeppelin
|
||||
## Overview
|
||||
|
||||
[R](https://www.r-project.org) is a free software environment for statistical computing and graphics.
|
||||
|
||||
|
|
@ -135,3 +42,80 @@ We recommend you to also install the following optional R libraries for happy da
|
|||
+ caret
|
||||
+ sqldf
|
||||
+ wordcloud
|
||||
|
||||
## Configuration
|
||||
|
||||
To run Zeppelin with the R Interpreter, the `SPARK_HOME` environment variable must be set. The best way to do this is by editing `conf/zeppelin-env.sh`.
|
||||
If it is not set, the R Interpreter will not be able to interface with Spark.
|
||||
|
||||
You should also copy `conf/zeppelin-site.xml.template` to `conf/zeppelin-site.xml`. That will ensure that Zeppelin sees the R Interpreter the first time it starts up.
|
||||
|
||||
## Using the R Interpreter
|
||||
|
||||
By default, the R Interpreter appears as two Zeppelin Interpreters, `%r` and `%knitr`.
|
||||
|
||||
`%r` will behave like an ordinary REPL. You can execute commands as in the CLI.
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/repl2plus2.png" width="700px"/>
|
||||
|
||||
R base plotting is fully supported
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/replhist.png" width="550px"/>
|
||||
|
||||
If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations.
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/replhead.png" width="550px"/>
|
||||
|
||||
`%knitr` interfaces directly against `knitr`, with chunk options on the first line:
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/knitgeo.png" width="550px"/>
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/knitstock.png" width="550px"/>
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/knitmotion.png" width="550px"/>
|
||||
|
||||
The two interpreters share the same environment. If you define a variable from `%r`, it will be within-scope if you then make a call using `knitr`.
|
||||
|
||||
## Using SparkR & Moving Between Languages
|
||||
|
||||
If `SPARK_HOME` is set, the `SparkR` package will be loaded automatically:
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/sparkrfaithful.png" width="550px"/>
|
||||
|
||||
The Spark Context and SQL Context are created and injected into the local environment automatically as `sc` and `sql`.
|
||||
|
||||
The same context are shared with the `%spark`, `%sql` and `%pyspark` interpreters:
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/backtoscala.png" width="700px"/>
|
||||
|
||||
You can also make an ordinary R variable accessible in scala and Python:
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/varr1.png" width="550px"/>
|
||||
|
||||
And vice versa:
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/varscala.png" width="550px"/>
|
||||
|
||||
<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/varr2.png" width="550px"/>
|
||||
|
||||
## Caveats & Troubleshooting
|
||||
|
||||
* Almost all issues with the R interpreter turned out to be caused by an incorrectly set `SPARK_HOME`. The R interpreter must load a version of the `SparkR` package that matches the running version of Spark, and it does this by searching `SPARK_HOME`. If Zeppelin isn't configured to interface with Spark in `SPARK_HOME`, the R interpreter will not be able to connect to Spark.
|
||||
|
||||
* The `knitr` environment is persistent. If you run a chunk from Zeppelin that changes a variable, then run the same chunk again, the variable has already been changed. Use immutable variables.
|
||||
|
||||
* (Note that `%spark.r` and `$r` are two different ways of calling the same interpreter, as are `%spark.knitr` and `%knitr`. By default, Zeppelin puts the R interpreters in the `%spark.` Interpreter Group.
|
||||
|
||||
* Using the `%r` interpreter, if you return a data.frame, HTML, or an image, it will dominate the result. So if you execute three commands, and one is `hist()`, all you will see is the histogram, not the results of the other commands. This is a Zeppelin limitation.
|
||||
|
||||
* If you return a data.frame (for instance, from calling `head()`) from the `%spark.r` interpreter, it will be parsed by Zeppelin's built-in data visualization system.
|
||||
|
||||
* Why `knitr` Instead of `rmarkdown`? Why no `htmlwidgets`? In order to support `htmlwidgets`, which has indirect dependencies, `rmarkdown` uses `pandoc`, which requires writing to and reading from disc. This makes it many times slower than `knitr`, which can operate entirely in RAM.
|
||||
|
||||
* Why no `ggvis` or `shiny`? Supporting `shiny` would require integrating a reverse-proxy into Zeppelin, which is a task.
|
||||
|
||||
* Max OS X & case-insensitive filesystem. If you try to install on a case-insensitive filesystem, which is the Mac OS X default, maven can unintentionally delete the install directory because `r` and `R` become the same subdirectory.
|
||||
|
||||
* Error `unable to start device X11` with the repl interpreter. Check your shell login scripts to see if they are adjusting the `DISPLAY` environment variable. This is common on some operating systems as a workaround for ssh issues, but can interfere with R plotting.
|
||||
|
||||
* akka Library Version or `TTransport` errors. This can happen if you try to run Zeppelin with a SPARK_HOME that has a version of Spark other than the one specified with `-Pspark-1.x` when Zeppelin was compiled.
|
||||
|
|
|
|||
Loading…
Reference in a new issue