Remove duplicated info in r.md & apply toc

2026-05-24 09:38:26 +00:00 · 2016-06-20 23:04:17 -07:00 · 2016-06-20 23:04:17 -07:00 · 3ffb3835cd
commit 3ffb3835cd
parent a03ca99f03
1 changed files with 80 additions and 96 deletions
--- a/docs/interpreter/r.md
+++ b/docs/interpreter/r.md
@ -6,104 +6,11 @@ group: manual
 ---
 {% include JB/setup %}

-## R Interpreter
+# R Interpreter for Apache Zeppelin

-This is a the Apache Zeppelin project, with the addition of support for the R programming language and R-spark integration.
+<div id="toc"></div>

-### Requirements
-
-Additional requirements for the R interpreter are:
-
- * R 3.1 or later (earlier versions may work, but have not been tested)
- * The `evaluate` R package.
-
-For full R support, you will also need the following R packages:
-
- * `knitr`
- * `repr` -- available with `devtools::install_github("IRkernel/repr")`
- * `htmltools` -- required for some interactive plotting
- * `base64enc` -- required to view R base plots
-
-### Configuration
-
-To run Zeppelin with the R Interpreter, the SPARK_HOME environment variable must be set. The best way to do this is by editing `conf/zeppelin-env.sh`.
-
-If it is not set, the R Interpreter will not be able to interface with Spark.
-
-You should also copy `conf/zeppelin-site.xml.template` to `conf/zeppelin-site.xml`.  That will ensure that Zeppelin sees the R Interpreter the first time it starts up.
-
-### Using the R Interpreter
-
-By default, the R Interpreter appears as two Zeppelin Interpreters, `%r` and `%knitr`.
-
-`%r` will behave like an ordinary REPL.  You can execute commands as in the CLI.   
-
-[![2+2](screenshots/repl2plus2.png)](screenshots/repl2plus2.png)
-
-R base plotting is fully supported
-
-[![replhist](screenshots/replhist.png)](screenshots/replhist.png)
-
-If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations.
-
-[![replhist](screenshots/replhead.png)](screenshots/replhead.png)
-
-`%knitr` interfaces directly against `knitr`, with chunk options on the first line:
-
-[![knitgeo](screenshots/knitgeo.png)](screenshots/knitgeo.png)
-[![knitstock](screenshots/knitstock.png)](screenshots/knitstock.png)
-[![knitmotion](screenshots/knitmotion.png)](screenshots/knitmotion.png)
-
-The two interpreters share the same environment.  If you define a variable from `%r`, it will be within-scope if you then make a call using `knitr`.
-
-### Using SparkR & Moving Between Languages
-
-If `SPARK_HOME` is set, the `SparkR` package will be loaded automatically:
-
-[![sparkrfaithful](screenshots/sparkrfaithful.png)](screenshots/sparkrfaithful.png)
-
-The Spark Context and SQL Context are created and injected into the local environment automatically as `sc` and `sql`.
-
-The same context are shared with the `%spark`, `%sql` and `%pyspark` interpreters:
-
-[![backtoscala](screenshots/backtoscala.png)](screenshots/backtoscala.png)
-
-You can also make an ordinary R variable accessible in scala and Python:
-
-[![varr1](screenshots/varr1.png)](screenshots/varr1.png)
-
-And vice versa:
-
-[![varscala](screenshots/varscala.png)](screenshots/varscala.png)
-[![varr2](screenshots/varr2.png)](screenshots/varr2.png)
-
-### Caveats & Troubleshooting
-
-* Almost all issues with the R interpreter turned out to be caused by an incorrectly set `SPARK_HOME`.  The R interpreter must load a version of the `SparkR` package that matches the running version of Spark, and it does this by searching `SPARK_HOME`. If Zeppelin isn't configured to interface with Spark in `SPARK_HOME`, the R interpreter will not be able to connect to Spark.
-
-* The `knitr` environment is persistent. If you run a chunk from Zeppelin that changes a variable, then run the same chunk again, the variable has already been changed.  Use immutable variables.
-
-* (Note that `%spark.r` and `$r` are two different ways of calling the same interpreter, as are `%spark.knitr` and `%knitr`. By default, Zeppelin puts the R interpreters in the `%spark.` Interpreter Group.
-
-* Using the `%r` interpreter, if you return a data.frame, HTML, or an image, it will dominate the result. So if you execute three commands, and one is `hist()`, all you will see is the histogram, not the results of the other commands. This is a Zeppelin limitation.
-
-* If you return a data.frame (for instance, from calling `head()`) from the `%spark.r` interpreter, it will be parsed by Zeppelin's built-in data visualization system.  
-
-* Why `knitr` Instead of `rmarkdown`?  Why no `htmlwidgets`?  In order to support `htmlwidgets`, which has indirect dependencies, `rmarkdown` uses `pandoc`, which requires writing to and reading from disc.  This makes it many times slower than `knitr`, which can operate entirely in RAM.
-
-* Why no `ggvis` or `shiny`?  Supporting `shiny` would require integrating a reverse-proxy into Zeppelin, which is a task.
-
-* Max OS X & case-insensitive filesystem.  If you try to install on a case-insensitive filesystem, which is the Mac OS X default, maven can unintentionally delete the install directory because `r` and `R` become the same subdirectory.
-
-* Error `unable to start device X11` with the repl interpreter.  Check your shell login scripts to see if they are adjusting the `DISPLAY` environment variable.  This is common on some operating systems as a workaround for ssh issues, but can interfere with R plotting.
-
-* akka Library Version or `TTransport` errors.  This can happen if you try to run Zeppelin with a SPARK_HOME that has a version of Spark other than the one specified with `-Pspark-1.x` when Zeppelin was compiled.
-
-
-
-
-
-## R Interpreter for Apache Zeppelin
+## Overview

 [R](https://www.r-project.org) is a free software environment for statistical computing and graphics.

@ -135,3 +42,80 @@ We recommend you to also install the following optional R libraries for happy da
 + caret
 + sqldf
 + wordcloud
+
+## Configuration
+
+To run Zeppelin with the R Interpreter, the `SPARK_HOME` environment variable must be set. The best way to do this is by editing `conf/zeppelin-env.sh`.
+If it is not set, the R Interpreter will not be able to interface with Spark.
+
+You should also copy `conf/zeppelin-site.xml.template` to `conf/zeppelin-site.xml`. That will ensure that Zeppelin sees the R Interpreter the first time it starts up.
+
+## Using the R Interpreter
+
+By default, the R Interpreter appears as two Zeppelin Interpreters, `%r` and `%knitr`.
+
+`%r` will behave like an ordinary REPL.  You can execute commands as in the CLI.   
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/repl2plus2.png" width="700px"/>
+
+R base plotting is fully supported
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/replhist.png" width="550px"/>
+
+If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations.
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/replhead.png" width="550px"/>
+
+`%knitr` interfaces directly against `knitr`, with chunk options on the first line:
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/knitgeo.png" width="550px"/>
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/knitstock.png" width="550px"/>
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/knitmotion.png" width="550px"/>
+
+The two interpreters share the same environment.  If you define a variable from `%r`, it will be within-scope if you then make a call using `knitr`.
+
+## Using SparkR & Moving Between Languages
+
+If `SPARK_HOME` is set, the `SparkR` package will be loaded automatically:
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/sparkrfaithful.png" width="550px"/>
+
+The Spark Context and SQL Context are created and injected into the local environment automatically as `sc` and `sql`.
+
+The same context are shared with the `%spark`, `%sql` and `%pyspark` interpreters:
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/backtoscala.png" width="700px"/>
+
+You can also make an ordinary R variable accessible in scala and Python:
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/varr1.png" width="550px"/>
+
+And vice versa:
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/varscala.png" width="550px"/>
+
+<img class="img-responsive" src="../assets/themes/zeppelin/img/docs-img/varr2.png" width="550px"/>
+
+## Caveats & Troubleshooting
+
+* Almost all issues with the R interpreter turned out to be caused by an incorrectly set `SPARK_HOME`.  The R interpreter must load a version of the `SparkR` package that matches the running version of Spark, and it does this by searching `SPARK_HOME`. If Zeppelin isn't configured to interface with Spark in `SPARK_HOME`, the R interpreter will not be able to connect to Spark.
+
+* The `knitr` environment is persistent. If you run a chunk from Zeppelin that changes a variable, then run the same chunk again, the variable has already been changed.  Use immutable variables.
+
+* (Note that `%spark.r` and `$r` are two different ways of calling the same interpreter, as are `%spark.knitr` and `%knitr`. By default, Zeppelin puts the R interpreters in the `%spark.` Interpreter Group.
+
+* Using the `%r` interpreter, if you return a data.frame, HTML, or an image, it will dominate the result. So if you execute three commands, and one is `hist()`, all you will see is the histogram, not the results of the other commands. This is a Zeppelin limitation.
+
+* If you return a data.frame (for instance, from calling `head()`) from the `%spark.r` interpreter, it will be parsed by Zeppelin's built-in data visualization system.  
+
+* Why `knitr` Instead of `rmarkdown`?  Why no `htmlwidgets`?  In order to support `htmlwidgets`, which has indirect dependencies, `rmarkdown` uses `pandoc`, which requires writing to and reading from disc.  This makes it many times slower than `knitr`, which can operate entirely in RAM.
+
+* Why no `ggvis` or `shiny`?  Supporting `shiny` would require integrating a reverse-proxy into Zeppelin, which is a task.
+
+* Max OS X & case-insensitive filesystem.  If you try to install on a case-insensitive filesystem, which is the Mac OS X default, maven can unintentionally delete the install directory because `r` and `R` become the same subdirectory.
+
+* Error `unable to start device X11` with the repl interpreter.  Check your shell login scripts to see if they are adjusting the `DISPLAY` environment variable.  This is common on some operating systems as a workaround for ssh issues, but can interfere with R plotting.
+
+* akka Library Version or `TTransport` errors.  This can happen if you try to run Zeppelin with a SPARK_HOME that has a version of Spark other than the one specified with `-Pspark-1.x` when Zeppelin was compiled.