update how it works on docs, add some comments on yaml files

This commit is contained in:
Lee moon soo 2018-12-07 10:45:35 -08:00
parent 423412a938
commit 0100a36f2f
3 changed files with 97 additions and 14 deletions

View file

@ -32,13 +32,14 @@ Key benefits are
## Prerequisites
- Zeppelin >= 0.9.0
- Spark >= 2.4.0
- Zeppelin >= 0.9.0 docker image
- Spark >= 2.4.0 docker image (in case of using Spark Interpreter)
- A running Kubernetes cluster with access configured to it using [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
- [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) configured in your cluster
- Enough cpu and memory in your Kubernetes cluster. We recommend 4CPUs, 6g of memory to be able to start Spark Interpreter with few executors.
- If you're using [minikube](https://kubernetes.io/docs/setup/minikube/), check your cluster capacity (`kubectl describe node`) and increase if necessary
```
$ minikube delete # otherwise configuration won't apply
$ minikube config set cpus <number>
@ -46,17 +47,17 @@ Key benefits are
$ minikube start
$ minikube config view
```
## Quickstart
Get `zeppelin-server.yaml` from github repository or find it from Zeppelin distribution package.
```
# download it from github
$ curl -s -O https://raw.githubusercontent.com/apache/zeppelin/master/k8s/zeppelin-server.yaml
# or get it from Zeppelin distribution package.
# Get it from Zeppelin distribution package.
$ ls <zeppelin-distribution>/k8s/zeppelin-server.yaml
# or download it from github
$ curl -s -O https://raw.githubusercontent.com/apache/zeppelin/master/k8s/zeppelin-server.yaml
```
Start zeppelin on kubernetes cluster,
@ -72,7 +73,7 @@ kubectl port-forward zeppelin-server 8080:80
```
and browse [localhost:8080](http://localhost:8080).
Try run some paragraphs and see each interpreter is running as a Pod (using `kubectl get pods`), instead of a local process.
To shutdown,
@ -80,7 +81,6 @@ To shutdown,
kubectl delete -f zeppelin-server.yaml
```
## Spark Interpreter
Build spark docker image to use Spark Interpreter.
@ -175,3 +175,79 @@ Currently, single docker image is being used in both Zeppelin server and Interpr
| Spark executors | m | Spark docker image | Spark Interpreter creates/deletes |
Currently, size of Zeppelin docker image is quite big. Zeppelin project is planning to provides lightweight images for each individual interpreter in the future.
## How it works
### Zeppelin on Kubernetes
`k8s/zeppelin-server.yaml` is provided to run Zeppelin Server with few sidecars and configurations.
Once Zeppelin Server is started in side Kubernetes, it auto configure itself to use `K8sStandardInterpreterLauncher`.
The launcher creates each interpreter in a Pod using templates located under `k8s/interpreter/` directory.
Templates in the directory applied in alphabetical order. Templates are rendered by [jinjava](https://github.com/HubSpot/jinjava)
and all interpreter properties are accessible inside the templates.
### Spark on Kubernetes
When interpreter group is `spark`, Zeppelin sets necessary spark configuration automatically to use Spark on Kubernetes.
It uses client mode, so Spark interpreter Pod works as a Spark driver, spark executors are launched in separate Pods.
This auto configuration can be overrided by manually setting `master` property of Spark interpreter.
### Accessing Spark UI (or Service running in interpreter Pod)
Zeppelin server Pod has a reverse proxy as a sidecar, and it splits traffic to Zeppelin server and Spark UI running in the other Pods.
It assume both `<your service domain>` and `*.<your service domain>` point the nginx proxy address.
`<your service domain>` is directed to ZeppelinServer, `*.<your service domain>` is directed to interpreter Pods.
`<port>-<interpreter pod svc name>.<your service domain>` is convention to access any application running in interpreter Pod.
For example, When your service domain name is `local.zeppelin-project.org` Spark interpreter Pod is running with a name `spark-axefeg` and Spark UI is running on port 4040,
```
4040-spark-axefeg.local.zeppelin-project.org
```
is the address to access Spark UI.
Default service domain is `local.zeppelin-project.org:8080`. `local.zeppelin-project.org` and `*.local.zeppelin-project.org` configured to resolve `127.0.0.1`.
It allows access Zeppelin and Spark UI with `kubectl port-forward zeppelin-server 8080:80`.
If you like to use your custom domain
1. Configure [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) in Kubernetes cluster for `http` port of the service `zeppelin-server` defined in `k8s/zeppelin-server.yaml`.
2. Configure DNS record that your service domain and wildcard subdomain point the IP Addresses of your Ingress.
3. Modify `serviceDomain` of `zeppelin-server-conf` ConfigMap in `k8s/zeppelin-server.yaml` file.
4. Apply changes (e.g. `kubectl apply -f k8s/zeppelin-server.yaml`)
## Persist /notebook and /conf directory
Notebook and configurations are not persisted by default. Please configure volume and update `k8s/zeppelin-server.yaml`
to use the volume to persiste /notebook and /conf directory if necessary.
## Customization
### Zeppelin Server Pod
Edit `k8s/zeppelin-server.yaml` and apply.
### Interpreter Pod
Since Interpreter Pod is created/deleted by ZeppelinServer using templates under `k8s/interpreter` directory,
to customize,
1. Prepare `k8s/interpreter` directory with customization (edit or create new yaml file), in a Kubernetes volume.
2. Modify `k8s/zeppelin-server.yaml` and mount prepared volume dir `k8s/interpreter` to `/zeppelin/k8s/interpreter/`.
3. Apply modified `k8s/zeppelin-server.yaml`.
4. Run a paragraph will create an interpreter using modified yaml files.
## Future work
- Smaller interpreter docker image.
- Blocking communication between interpreter Pod.
- Spark Interpreter Pod has Role CRUD for any pod/service in the same namespace. Which should be restricted to only Spark executors Pod.
- Per note interpreter mode by default when Zeppelin is running on Kubernetes

View file

@ -48,7 +48,7 @@ spec:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["sh", "-c", "ps -ef | grep org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer | awk '{print $2}' | xargs kill"]
command: ["sh", "-c", "ps -ef | grep org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer | grep -v grep | awk '{print $2}' | xargs kill"]
env:
{% for key, value in zeppelin.k8s.envs.items() %}
- name: {{key}}

View file

@ -90,7 +90,7 @@ spec:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["sh", "-c", "ps -ef | grep org.apache.zeppelin.server.ZeppelinServer | awk '{print $2}' | xargs kill"]
command: ["sh", "-c", "ps -ef | grep org.apache.zeppelin.server.ZeppelinServer | grep -v grep | awk '{print $2}' | xargs kill"]
env:
- name: ZEPPELIN_K8S_CONTAINER_IMAGE
value: apache/zeppelin:0.9.0-SNAPSHOT
@ -120,9 +120,16 @@ spec:
key: serviceDomain
- name: MASTER # default value of master property for spark interpreter.
value: k8s://https://kubernetes.default.svc
# volumeMounts:
# - name: zeppelin-server-notebook-volume # configure this to persist notebook
# mountPath: /zeppelin/notebook
# - name: zeppelin-server-conf # configure this to persist Zeppelin configuration
# mountPath: /zeppelin/conf
# - name: zeppelin-server-custom-k8s # configure this to mount customized Kubernetes spec for interpreter
# mountPath: /zeppelin/k8s
- name: zeppelin-server-gateway
image: nginx:1.14.0
command: [ "/bin/sh", "-c" ]
command: ["/bin/sh", "-c"]
args:
- cp -f /tmp/conf/nginx.conf /etc/nginx/nginx.conf;
sed -i -e "s/SERVICE_DOMAIN/$(cat /tmp/conf/serviceDomain)/g" /etc/nginx/nginx.conf;
@ -136,8 +143,8 @@ spec:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["/usr/sbin/nginx","-s","quit"]
- name: dnsmasq
command: ["/usr/sbin/nginx", "-s", "quit"]
- name: dnsmasq # nginx requires dns resolver for dynamic dns resolution
image: "janeczku/go-dnsmasq:release-1.0.5"
args:
- --listen