mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
### What is this PR for?
This is the first version for supporting yarn-cluster of `SparkInterpreter`. I just delegate all the function to `spark-submit` as yarn-cluster is natively supported by spark, we don't need to reinvent the wheel. But there's still improvement to be done in future, e.g. I put some spark specific logic in `InterpreterSetting` which is not a good practise. I plan to improve it when I refactor the `Interpreter` class (ZEPPELIN-2685).
Besides that, I also add `MiniHadoopCluster` & `MiniZeppelin` which help for the integration test of yarn-client & yarn-cluster mode, otherwise I have to manually verify yarn-client & yarn-cluster mode which would easily cause regression issue in future.
To be noticed:
* SPARK_HOME must be specified for yarn-cluster mode
* HADOOP_CONF_DIR must be specified for yarn-cluster mode
### What type of PR is it?
[Feature]
### Todos
* [ ] - Task
### What is the Jira issue?
https://github.com/zjffdu/zeppelin/tree/ZEPPELIN-2898
### How should this be tested?
System test is added in `SparkInterpreterIT`.
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No
Author: Jeff Zhang <zjffdu@apache.org>
Closes #2577 from zjffdu/ZEPPELIN-2898 and squashes the following commits:
|
||
|---|---|---|
| .. | ||
| alluxio.md | ||
| beam.md | ||
| bigquery.md | ||
| cassandra.md | ||
| elasticsearch.md | ||
| flink.md | ||
| geode.md | ||
| groovy.md | ||
| hbase.md | ||
| hdfs.md | ||
| hive.md | ||
| ignite.md | ||
| jdbc.md | ||
| kylin.md | ||
| lens.md | ||
| livy.md | ||
| mahout.md | ||
| markdown.md | ||
| pig.md | ||
| postgresql.md | ||
| python.md | ||
| r.md | ||
| scalding.md | ||
| scio.md | ||
| shell.md | ||
| spark.md | ||