OpenMetadata/docker/development/helm
Pere Miquel Brull 852b42cc5d Fix k8s operator exit handler pod loop and TTL cleanup, add tolerations (#26971)
* Fix k8s operator exit handler pod loop and TTL cleanup, add tolerations support (#26772)

Fix two bugs in the OMJob operator:
- Exit handler pods were recreated indefinitely because findExitHandlerPod()
  lacked the name-based fallback that findMainPod() already had, causing
  label propagation delays to trigger repeated pod creation events
- Terminal phase handler never rescheduled for TTL-based cleanup, so pods
  were never cleaned up after ttlSecondsAfterFinished expired

Add tolerations support for ingestion pod scheduling across the full stack:
- Operator: OMJobPodSpec field, PodManager.buildPod(), CRD schema
- Server: OMJob model, K8sPipelineClientConfig parsing, K8sPipelineClient
  builder, K8sJobUtils serialization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add K8S_TOLERATIONS env var mapping in openmetadata.yaml

Adds the tolerations config binding so the server picks up the
K8S_TOLERATIONS env var set by the Helm chart secret.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add tolerations to k8s test values for local validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix cleanup

* Address PR review: remove redundant pod lookup and guard null items

- Remove redundant server-created pod selector fallback in findMainPod()
  since buildPodSelector() now matches all pods by omjob-name alone
- Add null guard for getItems() in deletePods() to prevent NPE
- Update local test values for namespace and image config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit cfd71e8bd3)
2026-04-14 07:44:17 +00:00
..
docker-compose-deps.yml [Search] Upgrade Clients (#25719) 2026-02-07 18:54:13 +05:30
README.md Finish K8sPipelineClient Implementation (#25172) 2026-01-15 08:17:55 +01:00
values-airflow-test.yaml Finish K8sPipelineClient Implementation (#25172) 2026-01-15 08:17:55 +01:00
values-k8s-test.yaml Fix k8s operator exit handler pod loop and TTL cleanup, add tolerations (#26971) 2026-04-14 07:44:17 +00:00

OpenMetadata Helm Chart Local Testing

Helper to test changes from https://github.com/open-metadata/openmetadata-helm-charts with local images while developing.

Prerequisites

Required Tools

# Install required tools
brew install helm kubectl minikube docker-compose

# Or on Ubuntu/Debian:
# sudo snap install helm kubectl minikube
# sudo apt-get install docker-compose

Required Resources

  • CPU: 4+ cores recommended
  • Memory: 8GB+ RAM recommended
  • Storage: 20GB+ free space

Quick Start

1. Start Local Dependencies

First, start the required database and search services:

cd docker/development/helm

# Start PostgreSQL and OpenSearch
docker-compose -f docker-compose-deps.yml up -d

# Wait for services to be ready (2-3 minutes)
docker-compose -f docker-compose-deps.yml logs -f

# Verify services are running
curl http://localhost:9200/_cluster/health
docker exec openmetadata_postgres_test psql -U openmetadata_user -d openmetadata_db -c "SELECT 1"

2. Start Kubernetes Cluster

# Start minikube with sufficient resources
minikube start --cpus 4 --memory 8192 --driver docker

# Enable ingress addon (optional)
minikube addons enable ingress

# Verify cluster is ready
kubectl cluster-info
kubectl get nodes

3. Create Required Secrets

# Create secrets that the chart expects
kubectl create secret generic postgres-secrets \
  --from-literal=openmetadata-postgres-password=openmetadata_password

kubectl create secret generic airflow-secrets \
  --from-literal=openmetadata-airflow-password=admin

Test Scenarios

Scenario A: Test Kubernetes Native Pipeline Client (NEW)

Test the new K8s native pipeline execution without Airflow dependencies.

# Use local chart directly, e.g.,
CHART_PATH="/Users/pmbrull/github/openmetadata-helm-charts/charts/openmetadata"

# Install with K8s native configuration
helm install openmetadata-k8s-test $CHART_PATH --values values-k8s-test.yaml

# Check deployment status
kubectl get pods -A
kubectl get jobs -n openmetadata-pipelines-test
kubectl get serviceaccounts,roles,rolebindings -n openmetadata-pipelines-test

# Check logs
kubectl logs -l app.kubernetes.io/name=openmetadata -f

# Test access
minikube tunnel  # In separate terminal
curl http://localhost:8585/api/v1/system/health

Scenario B: Test Migrated Airflow Configuration

Test that the new nested Airflow configuration still works.

# Install with migrated Airflow configuration  
helm install openmetadata-airflow-test $CHART_PATH \
  --values values-airflow-test.yaml \
  --timeout 10m \
  --wait

# Check deployment
kubectl get pods -A
kubectl logs -l app.kubernetes.io/name=openmetadata -f

Validation Checklist

Basic Functionality

  1. Pod Health
# Check all pods are running
kubectl get pods -A

# Check OpenMetadata pod logs
kubectl logs deployment/openmetadata-k8s-test -f

# Check database connectivity
kubectl exec deployment/openmetadata-k8s-test -- curl -f http://postgres:5432 || echo "DB connection test"
  1. API Health
# Access health endpoint
curl http://localhost:8585/api/v1/system/health

# Check API response
curl http://localhost:8585/api/v1/system/config

K8s Native Pipeline Features

  1. RBAC Resources
# Check namespace creation
kubectl get namespace openmetadata-pipelines-test

# Check service account
kubectl get serviceaccount -n openmetadata-pipelines-test openmetadata-ingestion-test

# Check permissions
kubectl auth can-i create jobs \
  --as=system:serviceaccount:openmetadata-pipelines-test:openmetadata-ingestion-test \
  -n openmetadata-pipelines-test
  1. Configuration Validation
# Check environment variables in pod
kubectl exec deployment/openmetadata-k8s-test -- env | grep K8S_

# Check secrets
kubectl get secret openmetadata-k8s-test-pipeline-secret -o yaml
kubectl get secret openmetadata-k8s-test-pipeline-secret -o jsonpath='{.data}' | base64 -d
  1. Pipeline Job Testing
# Create a test pipeline job (manual)
cat <<EOF | kubectl apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: test-ingestion-job
  namespace: openmetadata-pipelines-test
spec:
  template:
    spec:
      serviceAccountName: openmetadata-ingestion-test
      containers:
      - name: test
        image: docker.getcollate.io/openmetadata/ingestion:latest
        command: ["echo", "Test pipeline job works"]
      restartPolicy: Never
EOF

# Check job execution
kubectl get jobs -n openmetadata-pipelines-test
kubectl logs job/test-ingestion-job -n openmetadata-pipelines-test

Failure Diagnostics Testing

  1. Create Failing Job
# Create a job that will fail
cat <<EOF | kubectl apply -f -
apiVersion: batch/v1
kind: Job  
metadata:
  name: test-failing-job
  namespace: openmetadata-pipelines-test
  labels:
    app.kubernetes.io/pipeline: test-pipeline
    app.kubernetes.io/run-id: test-123
spec:
  template:
    spec:
      serviceAccountName: openmetadata-ingestion-test
      containers:
      - name: main
        image: docker.getcollate.io/openmetadata/ingestion:latest
        command: ["sh", "-c", "echo 'Starting ingestion...'; sleep 10; echo 'Something went wrong!'; exit 1"]
      restartPolicy: Never
EOF

# Watch for diagnostic job creation
kubectl get jobs -n openmetadata-pipelines-test -w

# Check diagnostic job logs
kubectl logs -n openmetadata-pipelines-test -l app.kubernetes.io/component=diagnostics

Configuration Migration Testing

  1. Test Both Configurations
# Test that both airflow and k8s configs are valid
helm template test-airflow $CHART_PATH --values values-airflow-test.yaml > /tmp/airflow-manifest.yaml
helm template test-k8s $CHART_PATH --values values-k8s-test.yaml > /tmp/k8s-manifest.yaml

# Validate manifests
kubectl apply --dry-run=client -f /tmp/airflow-manifest.yaml
kubectl apply --dry-run=client -f /tmp/k8s-manifest.yaml
  1. Test Breaking Changes
# Test old configuration format (should fail validation or show warnings)
cat > /tmp/old-values.yaml << EOF
openmetadata:
  config:
    pipelineServiceClientConfig:
      enabled: true
      className: "org.openmetadata.service.clients.pipeline.airflow.AirflowRESTClient"
      apiEndpoint: http://test
EOF

helm template test-old $CHART_PATH --values /tmp/old-values.yaml

Debug Commands

# Get all resources
kubectl get all -A

# Describe OpenMetadata deployment
kubectl describe deployment openmetadata-k8s-test

# Get events
kubectl get events --sort-by='.lastTimestamp' -A

# Check helm release
helm list
helm status openmetadata-k8s-test
helm get values openmetadata-k8s-test

Cleanup

Remove Test Deployments

# Remove helm releases
helm uninstall openmetadata-k8s-test
helm uninstall openmetadata-airflow-test

# Clean up namespaces
kubectl delete namespace openmetadata-pipelines-test

# Clean up secrets
kubectl delete secret postgres-secrets airflow-secrets

Expected Results

Successful Deployment Indicators

  1. All pods running: kubectl get pods -A shows all pods in Running state
  2. Health check passing: curl http://localhost:8585/api/v1/system/health returns 200
  3. RBAC created: K8s resources created in openmetadata-pipelines-test namespace
  4. Configuration loaded: Environment variables properly set in pods
  5. No errors in logs: kubectl logs deployment/openmetadata-k8s-test shows successful startup

Performance Benchmarks

  • Startup time: < 5 minutes for full deployment
  • Memory usage: < 4GB total (including dependencies)
  • CPU usage: < 2 cores under normal load
  • Job creation: < 30 seconds for pipeline job creation and execution

This comprehensive testing suite validates both the new K8s native functionality and ensures backward compatibility with existing Airflow configurations.