mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
update how it works on docs, add some comments on yaml files
This commit is contained in:
parent
423412a938
commit
0100a36f2f
3 changed files with 97 additions and 14 deletions
|
|
@ -32,13 +32,14 @@ Key benefits are
|
|||
|
||||
## Prerequisites
|
||||
|
||||
- Zeppelin >= 0.9.0
|
||||
- Spark >= 2.4.0
|
||||
- Zeppelin >= 0.9.0 docker image
|
||||
- Spark >= 2.4.0 docker image (in case of using Spark Interpreter)
|
||||
- A running Kubernetes cluster with access configured to it using [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
|
||||
- [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) configured in your cluster
|
||||
- Enough cpu and memory in your Kubernetes cluster. We recommend 4CPUs, 6g of memory to be able to start Spark Interpreter with few executors.
|
||||
|
||||
- If you're using [minikube](https://kubernetes.io/docs/setup/minikube/), check your cluster capacity (`kubectl describe node`) and increase if necessary
|
||||
|
||||
```
|
||||
$ minikube delete # otherwise configuration won't apply
|
||||
$ minikube config set cpus <number>
|
||||
|
|
@ -46,17 +47,17 @@ Key benefits are
|
|||
$ minikube start
|
||||
$ minikube config view
|
||||
```
|
||||
|
||||
|
||||
## Quickstart
|
||||
|
||||
Get `zeppelin-server.yaml` from github repository or find it from Zeppelin distribution package.
|
||||
|
||||
```
|
||||
# download it from github
|
||||
$ curl -s -O https://raw.githubusercontent.com/apache/zeppelin/master/k8s/zeppelin-server.yaml
|
||||
|
||||
# or get it from Zeppelin distribution package.
|
||||
# Get it from Zeppelin distribution package.
|
||||
$ ls <zeppelin-distribution>/k8s/zeppelin-server.yaml
|
||||
|
||||
# or download it from github
|
||||
$ curl -s -O https://raw.githubusercontent.com/apache/zeppelin/master/k8s/zeppelin-server.yaml
|
||||
```
|
||||
|
||||
Start zeppelin on kubernetes cluster,
|
||||
|
|
@ -72,7 +73,7 @@ kubectl port-forward zeppelin-server 8080:80
|
|||
```
|
||||
|
||||
and browse [localhost:8080](http://localhost:8080).
|
||||
|
||||
Try run some paragraphs and see each interpreter is running as a Pod (using `kubectl get pods`), instead of a local process.
|
||||
|
||||
To shutdown,
|
||||
|
||||
|
|
@ -80,7 +81,6 @@ To shutdown,
|
|||
kubectl delete -f zeppelin-server.yaml
|
||||
```
|
||||
|
||||
|
||||
## Spark Interpreter
|
||||
|
||||
Build spark docker image to use Spark Interpreter.
|
||||
|
|
@ -175,3 +175,79 @@ Currently, single docker image is being used in both Zeppelin server and Interpr
|
|||
| Spark executors | m | Spark docker image | Spark Interpreter creates/deletes |
|
||||
|
||||
Currently, size of Zeppelin docker image is quite big. Zeppelin project is planning to provides lightweight images for each individual interpreter in the future.
|
||||
|
||||
|
||||
## How it works
|
||||
|
||||
### Zeppelin on Kubernetes
|
||||
|
||||
`k8s/zeppelin-server.yaml` is provided to run Zeppelin Server with few sidecars and configurations.
|
||||
Once Zeppelin Server is started in side Kubernetes, it auto configure itself to use `K8sStandardInterpreterLauncher`.
|
||||
|
||||
The launcher creates each interpreter in a Pod using templates located under `k8s/interpreter/` directory.
|
||||
Templates in the directory applied in alphabetical order. Templates are rendered by [jinjava](https://github.com/HubSpot/jinjava)
|
||||
and all interpreter properties are accessible inside the templates.
|
||||
|
||||
### Spark on Kubernetes
|
||||
|
||||
When interpreter group is `spark`, Zeppelin sets necessary spark configuration automatically to use Spark on Kubernetes.
|
||||
It uses client mode, so Spark interpreter Pod works as a Spark driver, spark executors are launched in separate Pods.
|
||||
This auto configuration can be overrided by manually setting `master` property of Spark interpreter.
|
||||
|
||||
|
||||
### Accessing Spark UI (or Service running in interpreter Pod)
|
||||
|
||||
Zeppelin server Pod has a reverse proxy as a sidecar, and it splits traffic to Zeppelin server and Spark UI running in the other Pods.
|
||||
It assume both `<your service domain>` and `*.<your service domain>` point the nginx proxy address.
|
||||
`<your service domain>` is directed to ZeppelinServer, `*.<your service domain>` is directed to interpreter Pods.
|
||||
|
||||
`<port>-<interpreter pod svc name>.<your service domain>` is convention to access any application running in interpreter Pod.
|
||||
|
||||
|
||||
For example, When your service domain name is `local.zeppelin-project.org` Spark interpreter Pod is running with a name `spark-axefeg` and Spark UI is running on port 4040,
|
||||
|
||||
```
|
||||
4040-spark-axefeg.local.zeppelin-project.org
|
||||
```
|
||||
|
||||
is the address to access Spark UI.
|
||||
|
||||
Default service domain is `local.zeppelin-project.org:8080`. `local.zeppelin-project.org` and `*.local.zeppelin-project.org` configured to resolve `127.0.0.1`.
|
||||
It allows access Zeppelin and Spark UI with `kubectl port-forward zeppelin-server 8080:80`.
|
||||
|
||||
|
||||
If you like to use your custom domain
|
||||
|
||||
1. Configure [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) in Kubernetes cluster for `http` port of the service `zeppelin-server` defined in `k8s/zeppelin-server.yaml`.
|
||||
2. Configure DNS record that your service domain and wildcard subdomain point the IP Addresses of your Ingress.
|
||||
3. Modify `serviceDomain` of `zeppelin-server-conf` ConfigMap in `k8s/zeppelin-server.yaml` file.
|
||||
4. Apply changes (e.g. `kubectl apply -f k8s/zeppelin-server.yaml`)
|
||||
|
||||
|
||||
## Persist /notebook and /conf directory
|
||||
|
||||
Notebook and configurations are not persisted by default. Please configure volume and update `k8s/zeppelin-server.yaml`
|
||||
to use the volume to persiste /notebook and /conf directory if necessary.
|
||||
|
||||
|
||||
## Customization
|
||||
|
||||
### Zeppelin Server Pod
|
||||
Edit `k8s/zeppelin-server.yaml` and apply.
|
||||
|
||||
### Interpreter Pod
|
||||
Since Interpreter Pod is created/deleted by ZeppelinServer using templates under `k8s/interpreter` directory,
|
||||
to customize,
|
||||
|
||||
1. Prepare `k8s/interpreter` directory with customization (edit or create new yaml file), in a Kubernetes volume.
|
||||
2. Modify `k8s/zeppelin-server.yaml` and mount prepared volume dir `k8s/interpreter` to `/zeppelin/k8s/interpreter/`.
|
||||
3. Apply modified `k8s/zeppelin-server.yaml`.
|
||||
4. Run a paragraph will create an interpreter using modified yaml files.
|
||||
|
||||
|
||||
## Future work
|
||||
|
||||
- Smaller interpreter docker image.
|
||||
- Blocking communication between interpreter Pod.
|
||||
- Spark Interpreter Pod has Role CRUD for any pod/service in the same namespace. Which should be restricted to only Spark executors Pod.
|
||||
- Per note interpreter mode by default when Zeppelin is running on Kubernetes
|
||||
|
|
|
|||
|
|
@ -48,7 +48,7 @@ spec:
|
|||
preStop:
|
||||
exec:
|
||||
# SIGTERM triggers a quick exit; gracefully terminate instead
|
||||
command: ["sh", "-c", "ps -ef | grep org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer | awk '{print $2}' | xargs kill"]
|
||||
command: ["sh", "-c", "ps -ef | grep org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer | grep -v grep | awk '{print $2}' | xargs kill"]
|
||||
env:
|
||||
{% for key, value in zeppelin.k8s.envs.items() %}
|
||||
- name: {{key}}
|
||||
|
|
|
|||
|
|
@ -90,7 +90,7 @@ spec:
|
|||
preStop:
|
||||
exec:
|
||||
# SIGTERM triggers a quick exit; gracefully terminate instead
|
||||
command: ["sh", "-c", "ps -ef | grep org.apache.zeppelin.server.ZeppelinServer | awk '{print $2}' | xargs kill"]
|
||||
command: ["sh", "-c", "ps -ef | grep org.apache.zeppelin.server.ZeppelinServer | grep -v grep | awk '{print $2}' | xargs kill"]
|
||||
env:
|
||||
- name: ZEPPELIN_K8S_CONTAINER_IMAGE
|
||||
value: apache/zeppelin:0.9.0-SNAPSHOT
|
||||
|
|
@ -120,9 +120,16 @@ spec:
|
|||
key: serviceDomain
|
||||
- name: MASTER # default value of master property for spark interpreter.
|
||||
value: k8s://https://kubernetes.default.svc
|
||||
# volumeMounts:
|
||||
# - name: zeppelin-server-notebook-volume # configure this to persist notebook
|
||||
# mountPath: /zeppelin/notebook
|
||||
# - name: zeppelin-server-conf # configure this to persist Zeppelin configuration
|
||||
# mountPath: /zeppelin/conf
|
||||
# - name: zeppelin-server-custom-k8s # configure this to mount customized Kubernetes spec for interpreter
|
||||
# mountPath: /zeppelin/k8s
|
||||
- name: zeppelin-server-gateway
|
||||
image: nginx:1.14.0
|
||||
command: [ "/bin/sh", "-c" ]
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- cp -f /tmp/conf/nginx.conf /etc/nginx/nginx.conf;
|
||||
sed -i -e "s/SERVICE_DOMAIN/$(cat /tmp/conf/serviceDomain)/g" /etc/nginx/nginx.conf;
|
||||
|
|
@ -136,8 +143,8 @@ spec:
|
|||
preStop:
|
||||
exec:
|
||||
# SIGTERM triggers a quick exit; gracefully terminate instead
|
||||
command: ["/usr/sbin/nginx","-s","quit"]
|
||||
- name: dnsmasq
|
||||
command: ["/usr/sbin/nginx", "-s", "quit"]
|
||||
- name: dnsmasq # nginx requires dns resolver for dynamic dns resolution
|
||||
image: "janeczku/go-dnsmasq:release-1.0.5"
|
||||
args:
|
||||
- --listen
|
||||
|
|
|
|||
Loading…
Reference in a new issue