mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
### What type of PR is it? This PR adds ability to run Zeppelin on Kubernetes. It aims - Zero configuration to start Zeppelin on Kubernetes. (and Spark on Kubernetes) - Run everything on Kubernetes: Zeppelin, Interpreters, Spark. - Highly customizable to adopt various user configurations and extensions. Key features are - Provides zeppelin-server.yaml file for `kubectl` to run Zeppelin server - All interpreters are automatically running as a Pod. - Spark interpreter automatically configured to use [Spark on Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html) - Reverse proxy is configured to access Spark UI To do - [x] Document how reverse proxy for Spark UI works and how to configure custom domain. - [x] Document how to customize zeppelin-server and interpreter yaml. - [x] Document new configurations - [x] Document how to mount volume for notebook and configurations ### How it works #### Run Zeppelin Server on Kubernetes `k8s/zeppelin-server.yaml` is provided to run Zeppelin Server with few sidecars and configurations. This file is easy to publish (user can easily consume it using `curl`), highly customizable while it includes all the necessary things. #### K8s Interpreter launcher This PR adds new module, `launcher-k8s-standard` under `zeppelin/zeppelin-plugins/launcher/k8s-standard/` directory. This launcher is [automatically being selected](https://github.com/apache/zeppelin/pull/3240/files#diff-82fddd2ffb77aaffc4b9cf7b5b1eaa79) when Zeppelin is running on Kubernetes. The launcher both handles Spark interpreter and All other interpreters. The launcher launches interpreter as a Pod using template [k8s/interpreter/100-interpreter-pod.yaml](https://github.com/apache/zeppelin/pull/3240/files#diff-d9ce62e2c992d32f0184d7edb862f3c4). Reason filename has `100-` in prefix is because all files in the directory is consumed in alphabetical order by launcher on interpreter start/stop. User can drop more files here to extend/customize interpreter, and filename can be used to control order. The template is rendered by [jinjava](https://github.com/HubSpot/jinjava). #### Spark interpreter When interpreter group is `spark`, K8sRemoteInterpreterProcess [sets necessary spark configuration](https://github.com/apache/zeppelin/pull/3240/files#diff-6d1d3084f55bdd519e39ede4a619e73dR297) automatically to use [Spark on Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html). User doesn't have to configure anything. It uses client mode. #### Spark UI We may make user manually configure port-forward or do something to access Spark UI, but that's not optimal. It is the best when Spark UI is automatically accessible when user have access to Zeppelin UI, without any extra configuration. To enable this, Zeppelin server Pod has a reverse proxy as a sidecar, and it split traffic to Zeppelin server and Spark UI running in the other Pod. It assume both `service.domain.com` and `*.service.domain.com` point the nginx proxy address. `service.domain.com` is directed to ZeppelinServer, `*.service.domain.com` is directed to interpreter Pod. `<port>-<interpreter pod svc name>.service.domain.com` is convention to access any application running in interpreter Pod. If Spark interpreter Pod is running with a name `spark-axefeg` and Spark UI is running on port 4040, ``` 4040-spark-axefeg.service.domain.com ``` is the address to access Spark UI. Default service domain is [local.zeppelin-project.org:8080](https://github.com/apache/zeppelin/pull/3240/files#diff-56ccb2e2c2617b27dbaae866d9431e51R22), while `local.zeppelin-project.org` and `*.local.zeppelin-project.org` point `127.0.0.1`, and it works with `kubectl port-forward`. ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-3840 ### How should this be tested? Prepare a Kubernetes cluster with enough resources (cpus > 5, mem > 6g). If you're using [minikube](https://github.com/kubernetes/minikube), check your capacity using `kubectl describe node` command before start. You'll need to build Zeppelin docker image and Spark docker image to test. Please follow guide docs/quickstart/kubernetes.md. To quickly try without building docker images, I have uploaded pre-built image on docker hub `moon/zeppelin:0.9.0-SNAPSHOT`, `moon/spark:2.4.0`. Try following command ``` ZEPPELIN_SERVER_YAML="curl -s https://raw.githubusercontent.com/Leemoonsoo/zeppelin/kubernetes/k8s/zeppelin-server.yaml" $ZEPPELIN_SERVER_YAML | sed 's/apache\/zeppelin:0.9.0-SNAPSHOT/moon\/zeppelin:0.9.0-SNAPSHOT/' | sed 's/spark:2.4.0/moon\/spark:2.4.0/' | kubectl apply -f - ``` And port forward ``` kubectl port-forward zeppelin-server 8080:80 ``` And browse http://localhost:8080 To clean up ``` $ZEPPELIN_SERVER_YAML | sed 's/apache\/zeppelin:0.9.0-SNAPSHOT/moon\/zeppelin:0.9.0-SNAPSHOT/' | sed 's/spark:2.4.0/moon\/spark:2.4.0/' | kubectl delete -f - ``` ### Screenshots (if appropriate) See this video https://youtu.be/7E4ZGn4pnTo ### Future work - Per interpreter docker image - Blocking communication between interpreter Pod. - Spark Interpreter Pod has Role CRUD for any pod/service in the same namespace. Which should be restricted to only Spark executors Pod. - Per note interpreter mode by default when Zeppelin is running on Kubernetes ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? yes Author: Lee moon soo <leemoonsoo@gmail.com> Author: Lee moon soo <moon@apache.org> Closes #3240 from Leemoonsoo/kubernetes and squashes the following commits: |
||
|---|---|---|
| .. | ||
| JB | ||
| themes/zeppelin | ||