mirror of
https://github.com/apache/zeppelin
synced 2026-05-24 09:38:26 +00:00
### What type of PR is it? This PR adds ability to run Zeppelin on Kubernetes. It aims - Zero configuration to start Zeppelin on Kubernetes. (and Spark on Kubernetes) - Run everything on Kubernetes: Zeppelin, Interpreters, Spark. - Highly customizable to adopt various user configurations and extensions. Key features are - Provides zeppelin-server.yaml file for `kubectl` to run Zeppelin server - All interpreters are automatically running as a Pod. - Spark interpreter automatically configured to use [Spark on Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html) - Reverse proxy is configured to access Spark UI To do - [x] Document how reverse proxy for Spark UI works and how to configure custom domain. - [x] Document how to customize zeppelin-server and interpreter yaml. - [x] Document new configurations - [x] Document how to mount volume for notebook and configurations ### How it works #### Run Zeppelin Server on Kubernetes `k8s/zeppelin-server.yaml` is provided to run Zeppelin Server with few sidecars and configurations. This file is easy to publish (user can easily consume it using `curl`), highly customizable while it includes all the necessary things. #### K8s Interpreter launcher This PR adds new module, `launcher-k8s-standard` under `zeppelin/zeppelin-plugins/launcher/k8s-standard/` directory. This launcher is [automatically being selected](https://github.com/apache/zeppelin/pull/3240/files#diff-82fddd2ffb77aaffc4b9cf7b5b1eaa79) when Zeppelin is running on Kubernetes. The launcher both handles Spark interpreter and All other interpreters. The launcher launches interpreter as a Pod using template [k8s/interpreter/100-interpreter-pod.yaml](https://github.com/apache/zeppelin/pull/3240/files#diff-d9ce62e2c992d32f0184d7edb862f3c4). Reason filename has `100-` in prefix is because all files in the directory is consumed in alphabetical order by launcher on interpreter start/stop. User can drop more files here to extend/customize interpreter, and filename can be used to control order. The template is rendered by [jinjava](https://github.com/HubSpot/jinjava). #### Spark interpreter When interpreter group is `spark`, K8sRemoteInterpreterProcess [sets necessary spark configuration](https://github.com/apache/zeppelin/pull/3240/files#diff-6d1d3084f55bdd519e39ede4a619e73dR297) automatically to use [Spark on Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html). User doesn't have to configure anything. It uses client mode. #### Spark UI We may make user manually configure port-forward or do something to access Spark UI, but that's not optimal. It is the best when Spark UI is automatically accessible when user have access to Zeppelin UI, without any extra configuration. To enable this, Zeppelin server Pod has a reverse proxy as a sidecar, and it split traffic to Zeppelin server and Spark UI running in the other Pod. It assume both `service.domain.com` and `*.service.domain.com` point the nginx proxy address. `service.domain.com` is directed to ZeppelinServer, `*.service.domain.com` is directed to interpreter Pod. `<port>-<interpreter pod svc name>.service.domain.com` is convention to access any application running in interpreter Pod. If Spark interpreter Pod is running with a name `spark-axefeg` and Spark UI is running on port 4040, ``` 4040-spark-axefeg.service.domain.com ``` is the address to access Spark UI. Default service domain is [local.zeppelin-project.org:8080](https://github.com/apache/zeppelin/pull/3240/files#diff-56ccb2e2c2617b27dbaae866d9431e51R22), while `local.zeppelin-project.org` and `*.local.zeppelin-project.org` point `127.0.0.1`, and it works with `kubectl port-forward`. ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-3840 ### How should this be tested? Prepare a Kubernetes cluster with enough resources (cpus > 5, mem > 6g). If you're using [minikube](https://github.com/kubernetes/minikube), check your capacity using `kubectl describe node` command before start. You'll need to build Zeppelin docker image and Spark docker image to test. Please follow guide docs/quickstart/kubernetes.md. To quickly try without building docker images, I have uploaded pre-built image on docker hub `moon/zeppelin:0.9.0-SNAPSHOT`, `moon/spark:2.4.0`. Try following command ``` ZEPPELIN_SERVER_YAML="curl -s https://raw.githubusercontent.com/Leemoonsoo/zeppelin/kubernetes/k8s/zeppelin-server.yaml" $ZEPPELIN_SERVER_YAML | sed 's/apache\/zeppelin:0.9.0-SNAPSHOT/moon\/zeppelin:0.9.0-SNAPSHOT/' | sed 's/spark:2.4.0/moon\/spark:2.4.0/' | kubectl apply -f - ``` And port forward ``` kubectl port-forward zeppelin-server 8080:80 ``` And browse http://localhost:8080 To clean up ``` $ZEPPELIN_SERVER_YAML | sed 's/apache\/zeppelin:0.9.0-SNAPSHOT/moon\/zeppelin:0.9.0-SNAPSHOT/' | sed 's/spark:2.4.0/moon\/spark:2.4.0/' | kubectl delete -f - ``` ### Screenshots (if appropriate) See this video https://youtu.be/7E4ZGn4pnTo ### Future work - Per interpreter docker image - Blocking communication between interpreter Pod. - Spark Interpreter Pod has Role CRUD for any pod/service in the same namespace. Which should be restricted to only Spark executors Pod. - Per note interpreter mode by default when Zeppelin is running on Kubernetes ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? yes Author: Lee moon soo <leemoonsoo@gmail.com> Author: Lee moon soo <moon@apache.org> Closes #3240 from Leemoonsoo/kubernetes and squashes the following commits:0100a36f2[Lee moon soo] update how it works on docs, add some comments on yaml files423412a93[Lee moon soo] zeppelin.k8s.mode -> zeppelin.run.mode4e7d8170d[Lee moon soo] localtest.me -> local.zeppelin-project.org993a0e44e[Lee moon soo] document configurations9ab6fc420[Lee moon soo] address code review22e090f61[Lee moon soo] logger -> LOGGER11960dd59[Lee moon soo] update corresponding test as well3b652a48e[Lee moon soo] Make spark executor set ownerreference correctly1a3a07098[Lee moon soo] Set ownerreference to Role and Rolebinding of interpretere2dc88a19[Lee moon soo] suppress error log when wait target is already removedfa36c18e3[Lee moon soo] Make spark master configurableb4f58a9a1[Lee moon soo] sig term for quick termination64a56b5c9[Lee moon soo] Add docse9ce64fe7[Lee moon soo] update dockerfileec09b8b88[Lee moon soo] add test3078bac55[Lee moon soo] spark ui support9341fcbfe[Lee moon soo] install kubectl and configure log4j in docker image0f7c0d4e8[Lee moon soo] add licensef30561189[Lee moon soo] rename file2b579ff12[Lee moon soo] let user override namespacef4166ad04[Lee moon soo] make spark container image configurable0d472ea52[Lee moon soo] load properties and environment variablesb0e2c36c6[Lee moon soo] Rbac role, rolebinding2960dcb87[Lee moon soo] configure namespacea4072e6b9[Lee moon soo] add signal handler7a8736756[Lee moon soo] configure spark on kubernetes263d859d4[Lee moon soo] use headless service for interpreter pod7fe9823b1[Lee moon soo] interpreter pod cascade delete on zeppelin-server delete86e876435[Lee moon soo] add services on RBAC18b8f68cb[Lee moon soo] print spec file contents on debug log0dea3836b[Lee moon soo] create and connect interpreter pod9f1b7a169[Lee moon soo] run kubernetes launcher2fd2ac8c3[Lee moon soo] kubernetes mode configuration58f9f1909[Lee moon soo] add rbac36cf391a4[Lee moon soo] correct plugin name52bb6c7e1[Lee moon soo] add k8s dir in package5f602a65e[Lee moon soo] K8sRemoteInterpreterProcess07489f76d[Lee moon soo] kubectl with execd2f3d5b7e[Lee moon soo] add k8s-standard launcher module
128 lines
5.3 KiB
XML
128 lines
5.3 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
~ Licensed to the Apache Software Foundation (ASF) under one or more
|
|
~ contributor license agreements. See the NOTICE file distributed with
|
|
~ this work for additional information regarding copyright ownership.
|
|
~ The ASF licenses this file to You under the Apache License, Version 2.0
|
|
~ (the "License"); you may not use this file except in compliance with
|
|
~ the License. You may obtain a copy of the License at
|
|
~
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
~
|
|
~ Unless required by applicable law or agreed to in writing, software
|
|
~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
~ See the License for the specific language governing permissions and
|
|
~ limitations under the License.
|
|
-->
|
|
|
|
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
|
|
|
<modelVersion>4.0.0</modelVersion>
|
|
|
|
<parent>
|
|
<artifactId>zeppelin</artifactId>
|
|
<groupId>org.apache.zeppelin</groupId>
|
|
<version>0.9.0-SNAPSHOT</version>
|
|
<relativePath>..</relativePath>
|
|
</parent>
|
|
|
|
<groupId>org.apache.zeppelin</groupId>
|
|
<artifactId>zengine-plugins-parent</artifactId>
|
|
<packaging>pom</packaging>
|
|
<version>0.9.0-SNAPSHOT</version>
|
|
<name>Zeppelin: Plugins Parent</name>
|
|
<description>Zeppelin Plugins Parent</description>
|
|
|
|
<modules>
|
|
<module>notebookrepo/s3</module>
|
|
<module>notebookrepo/vfs</module>
|
|
<module>notebookrepo/git</module>
|
|
<module>notebookrepo/github</module>
|
|
<module>notebookrepo/azure</module>
|
|
<module>notebookrepo/gcs</module>
|
|
<module>notebookrepo/zeppelin-hub</module>
|
|
<module>notebookrepo/filesystem</module>
|
|
|
|
<module>launcher/standard</module>
|
|
<module>launcher/k8s-standard</module>
|
|
<module>launcher/spark</module>
|
|
</modules>
|
|
|
|
<dependencies>
|
|
<dependency>
|
|
<groupId>${project.groupId}</groupId>
|
|
<artifactId>zeppelin-zengine</artifactId>
|
|
<version>${project.version}</version>
|
|
<scope>provided</scope>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-core</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
|
|
<!-- Test libraries -->
|
|
<dependency>
|
|
<groupId>junit</groupId>
|
|
<artifactId>junit</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.mockito</groupId>
|
|
<artifactId>mockito-all</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
</dependencies>
|
|
|
|
<build>
|
|
<pluginManagement>
|
|
<plugins>
|
|
<plugin>
|
|
<artifactId>maven-dependency-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<id>copy-plugin-dependencies</id>
|
|
<phase>package</phase>
|
|
<goals>
|
|
<goal>copy-dependencies</goal>
|
|
</goals>
|
|
<configuration>
|
|
<outputDirectory>${project.build.directory}/../../../../plugins/${plugin.name}</outputDirectory>
|
|
<overWriteReleases>false</overWriteReleases>
|
|
<overWriteSnapshots>false</overWriteSnapshots>
|
|
<overWriteIfNewer>true</overWriteIfNewer>
|
|
<includeScope>runtime</includeScope>
|
|
</configuration>
|
|
</execution>
|
|
<execution>
|
|
<id>copy-plugin-artifact</id>
|
|
<phase>package</phase>
|
|
<goals>
|
|
<goal>copy</goal>
|
|
</goals>
|
|
<configuration>
|
|
<outputDirectory>${project.build.directory}/../../../../plugins/${plugin.name}</outputDirectory>
|
|
<overWriteReleases>false</overWriteReleases>
|
|
<overWriteSnapshots>false</overWriteSnapshots>
|
|
<overWriteIfNewer>true</overWriteIfNewer>
|
|
<artifactItems>
|
|
<artifactItem>
|
|
<groupId>${project.groupId}</groupId>
|
|
<artifactId>${project.artifactId}</artifactId>
|
|
<version>${project.version}</version>
|
|
<type>${project.packaging}</type>
|
|
</artifactItem>
|
|
</artifactItems>
|
|
</configuration>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
</plugins>
|
|
</pluginManagement>
|
|
</build>
|
|
</project>
|