datahaven/.github/workflows/task-docker-ci.yml
Steve Degosserie 9a5404de82
refactor: Consolidate and optimize Docker image architecture (#233)
## Overview

This PR consolidates and optimizes the Docker build system, reducing
redundancy and improving CI/CD performance. The changes eliminate
duplicate Dockerfiles, introduce a flexible build template, and optimize
release builds to reuse CI artifacts.

## Changes Summary

### 🐳 Docker Images Restructured

**Before:** 5 Dockerfiles with significant overlap
**After:** 4 focused images + 1 utility

#### Final Structure:

1. **`operator/Dockerfile`**  Updated
   - **Standard operator image** for CI and release builds
   - Minimal node image (accepts pre-built binaries)
   - GHCR: `ghcr.io/datahaven-xyz/datahaven/datahaven` (CI)
   - DockerHub: `datahavenxyz/datahaven` (releases)

2. **`docker/datahaven-build.Dockerfile`** (moved from
`operator/Dockerfile`)
   - Full source-to-binary build for manual releases
   - DockerHub: `datahavenxyz/datahaven:{label}`
   - Supports custom RUSTFLAGS and fast-runtime feature
   - Only used for manual workflow_dispatch builds

3. **`docker/datahaven-production.Dockerfile`** (kept)
   - Binary builder for CPU-specific releases
   - Used by build-prod-binary workflow template
   - Supports custom target-cpu flags

4. **`docker/datahaven-dev.Dockerfile`**  NEW (local dev only)
   - **FOR LOCAL DEVELOPMENT/TROUBLESHOOTING ONLY**
   - Includes debug tools: gdb, strace, vim, sudo
   - Extra dependencies: librocksdb-dev, curl
   - RUST_BACKTRACE enabled by default
   - **DO NOT USE for CI or production builds**

5. **`test/docker/crossbuild-mac-libpq.dockerfile`** (kept)
   - Utility for macOS → Linux cross-compilation

#### Removed (Redundant):
-  `docker/datahaven.Dockerfile` → replaced by operator/Dockerfile
-  `test/docker/datahaven-node-local.dockerfile` → replaced by
datahaven-dev.Dockerfile

---

### 🔄 Workflow Improvements

#### Enhanced `publish-docker` Template
- Supports both GHCR and DockerHub registries
- Flexible inputs: dockerfile, context, build-args, cache scope
- Auto-generates OCI-compliant labels
- Reduces code duplication (~70 lines → ~15 per workflow)

#### Refactored CI Pipeline
- **`docker-build-ci`**: Builds `operator/Dockerfile` → GHCR for CI/E2E
testing
- **`docker-build-release`**: Builds `operator/Dockerfile` → DockerHub
(main branch only)
- Both CI and release workflows now use the same minimal operator image
- Release builds **reuse CI binaries** instead of rebuilding from source

#### Optimized Release Workflow
The `task-docker-release` workflow now has dual modes:

**Mode 1: `workflow_call` (CI - main pushes)**
-  Reuses binary from CI's build-operator task
-  Uses lightweight `operator/Dockerfile`
-  Tags: `latest`, `sha-{short}`
-  **Fast**: ~5 minutes (vs ~30 min previously)

**Mode 2: `workflow_dispatch` (Manual)**
-  Full source build with `datahaven-build.Dockerfile`
-  Custom branch and label selection
-  Optional fast-runtime feature
-  Tags: `PROD-{label}` or user-defined

---

### 🔧 Additional Optimizations

- Copy libpq5 from builder stage instead of reinstalling (smaller,
faster)
- Remove redundant protobuf-compiler package (use protoc v21.12
directly)
- Standardize user UID to 1000 across all runtime images
- Consistent OCI labeling and metadata

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-15 01:33:20 +02:00

117 lines
3.6 KiB
YAML

name: Docker Build & Publish (CI)
on:
workflow_dispatch:
inputs:
binary-hash:
description: "The hash of the operator binary"
required: false
type: string
workflow_call:
inputs:
binary-hash:
description: "The hash of the operator binary"
required: true
type: string
outputs:
image-tag:
description: "The tag portion of the docker image (without registry)"
value: "${{ jobs.build-test-push.outputs.image-tag }}"
permissions:
contents: read
packages: write
concurrency:
group: docker-build-ci-${{ github.ref }}
cancel-in-progress: true
jobs:
build-test-push:
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.extract_tag.outputs.image-tag }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Download binary artifact
uses: actions/download-artifact@v4
with:
name: datahaven-node-${{ inputs.binary-hash }}
path: ./build/
- name: Prepare binary
run: |
chmod +x ./build/datahaven-node
ls -la ./build/
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/datahaven-xyz/datahaven/datahaven
flavor: |
latest=auto
tags: |
type=raw,value=ci-${{ github.run_id }}
type=sha,format=short,prefix=sha-
type=ref,event=tag
type=ref,event=branch
type=ref,event=pr
- name: Extract tag for job output
id: extract_tag
run: |
FULL_TAG=$(echo '${{ steps.meta.outputs.json }}' | jq -r '.tags[-1]')
TAG_ONLY=$(echo "$FULL_TAG" | sed 's|.*:||')
echo "image-tag=$TAG_ONLY" >> $GITHUB_OUTPUT
echo "image-name=ghcr.io/datahaven-xyz/datahaven/datahaven:$TAG_ONLY" >> $GITHUB_OUTPUT
- name: Build and push Docker image
uses: ./.github/workflow-templates/publish-docker
with:
dockerfile: ./operator/Dockerfile
context: .
registry: ghcr.io
registry_username: ${{ github.actor }}
registry_password: ${{ secrets.GITHUB_TOKEN }}
image_tags: ${{ steps.meta.outputs.tags }}
image_title: "DataHaven Node - CI"
image_description: "CI build of DataHaven operator node"
cache_scope: datahaven-ci-build
# --- Smoke tests ---
- name: Pull and test node --help
run: |
docker pull ${{ steps.extract_tag.outputs.image-name }}
docker run --rm ${{ steps.extract_tag.outputs.image-name }} --help
- name: Integration test (dev chain starts)
run: |
docker run --rm -d -p 9944:9944 --name local-dh-node \
${{ steps.extract_tag.outputs.image-name }} --dev --unsafe-rpc-external
- name: Wait for node to be healthy and test
run: |
echo "Waiting for node to start..."
for i in {1..30}; do # Retry for 30 * 5s = 150 seconds
if curl --fail --location 'http://127.0.0.1:9944' \
--header 'Content-Type: application/json' \
--data '{"jsonrpc":"2.0","id":1,"method":"system_chain","params":[]}' ; then
echo "Node is healthy!"
docker logs local-dh-node --tail 100
exit 0
fi
echo "Attempt $i: Node not ready yet, sleeping 5s..."
sleep 5
done
echo "Node failed to start or respond in time."
docker logs local-dh-node --tail 100
exit 1
- name: Cleanup integration test container
if: always()
run: docker rm -f local-dh-node