hyperdx/.github/workflows/main.yml
Warren Lee 0a4fb15df2
[HDX-4029] Add commonly-used core and contrib components to OTel Collector builder-config (#2121)
## Summary

Update `packages/otel-collector/builder-config.yaml` to include commonly-used components from the upstream [opentelemetry-collector](https://github.com/open-telemetry/opentelemetry-collector) core and [opentelemetry-collector-contrib](https://github.com/open-telemetry/opentelemetry-collector-contrib) distributions. This gives users more flexibility in their custom OTel configs without pulling in the entire contrib distribution (which causes very long compile times).

Also adds Go module and build cache mounts to the OCB Docker build stage for faster rebuilds, and bumps CI timeouts for integration and smoke test jobs to account for the larger binary.

### Core extensions added (2)

- `memorylimiterextension` — memory-based limiting at the extension level
- `zpagesextension` — zPages debugging endpoints

### Contrib receivers added (4)

- `dockerstatsreceiver` — container metrics from Docker
- `filelogreceiver` — tail log files
- `k8sclusterreceiver` — Kubernetes cluster-level metrics
- `kubeletstatsreceiver` — node/pod/container metrics from kubelet

### Contrib processors added (12)

- `attributesprocessor` — insert/update/delete/hash attributes
- `cumulativetodeltaprocessor` — convert cumulative metrics to delta
- `filterprocessor` — drop unwanted telemetry
- `groupbyattrsprocessor` — reassign resource attributes
- `k8sattributesprocessor` — enrich telemetry with k8s metadata
- `logdedupprocessor` — deduplicate repeated log entries
- `metricstransformprocessor` — rename/aggregate/transform metrics
- `probabilisticsamplerprocessor` — percentage-based sampling
- `redactionprocessor` — mask/remove sensitive data
- `resourceprocessor` — modify resource attributes
- `spanprocessor` — rename spans, extract attributes
- `tailsamplingprocessor` — sample traces based on policies

### Contrib extensions added (1)

- `filestorage` — persistent file-based storage (used by clickhouse exporter sending queue in EE OpAMP controller)

### Other changes

- **Docker cache mounts**: Added `--mount=type=cache` for Go module and build caches in the OCB builder stage of both `docker/otel-collector/Dockerfile` and `docker/hyperdx/Dockerfile`
- **CI timeouts**: Bumped `integration` and `otel-smoke-test` jobs from 8 to 16 minutes in `.github/workflows/main.yml`

All existing HyperDX-specific components are preserved unchanged.

### How to test locally or on Vercel

1. Build the OTel Collector Docker image — verify OCB resolves all listed modules
2. Provide a custom OTel config that uses one of the newly-added components and verify it loads
3. Verify existing HyperDX OTel pipeline still functions

### References

- Linear Issue: https://linear.app/clickhouse/issue/HDX-4029
- Upstream core builder-config: https://github.com/open-telemetry/opentelemetry-collector/blob/main/cmd/otelcorecol/builder-config.yaml
- Upstream contrib builder-config: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/cmd/otelcontribcol/builder-config.yaml
2026-04-15 15:57:44 +00:00

287 lines
9.5 KiB
YAML

name: Main
on:
push:
branches: [main, v1]
pull_request:
branches: [main, v1]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
timeout-minutes: 8
runs-on: ubuntu-24.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup node
uses: actions/setup-node@v4
with:
node-version-file: '.nvmrc'
cache-dependency-path: 'yarn.lock'
cache: 'yarn'
- name: Install root dependencies
run: yarn install
- name: Build dependencies
run: make ci-build
- name: Install core libs
run: sudo apt-get install --yes curl bc
- name: Run lint + type check
run: make ci-lint
unit:
timeout-minutes: 8
runs-on: ubuntu-24.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup node
uses: actions/setup-node@v4
with:
node-version-file: '.nvmrc'
cache-dependency-path: 'yarn.lock'
cache: 'yarn'
- name: Install root dependencies
run: yarn install
- name: Build dependencies
run: make ci-build
- name: Run unit tests
run: make ci-unit
integration:
timeout-minutes: 16
runs-on: ubuntu-24.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup node
uses: actions/setup-node@v4
with:
node-version-file: '.nvmrc'
cache-dependency-path: 'yarn.lock'
cache: 'yarn'
- name: Install root dependencies
run: yarn install
- name: Expose GitHub Runtime
uses: crazy-max/ghaction-github-runtime@v2
- name: Spin up docker services
run: |
docker buildx create --use --driver=docker-container
docker buildx bake -f ./docker-compose.ci.yml --set *.cache-to="type=gha" --set *.cache-from="type=gha" --load
- name: Build dependencies
run: make ci-build
- name: Run integration tests
run: make ci-int
otel-unit-test:
timeout-minutes: 8
runs-on: ubuntu-24.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v46
with:
files: |
packages/otel-collector/**
- name: Setup Go
if: steps.changed-files.outputs.any_changed == 'true'
uses: actions/setup-go@v5
with:
go-version-file: packages/otel-collector/go.mod
cache-dependency-path: packages/otel-collector/go.sum
- name: Run unit tests
if: steps.changed-files.outputs.any_changed == 'true'
working-directory: ./packages/otel-collector
run: go test ./...
otel-smoke-test:
timeout-minutes: 16
runs-on: ubuntu-24.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Get changed OTEL collector files
id: changed-files
uses: tj-actions/changed-files@v46
with:
files: |
docker/otel-collector/**
smoke-tests/otel-ccollector/**
- name: Install required tooling
if: steps.changed-files.outputs.any_changed == 'true'
env:
DEBIAN_FRONTEND: noninteractive
run: |
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
ARCH=$(dpkg --print-architecture)
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg arch=${ARCH}] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install --yes curl bats clickhouse-client
- name: Run Smoke Tests
if: steps.changed-files.outputs.any_changed == 'true'
working-directory: ./smoke-tests/otel-collector
run: bats .
e2e-tests:
uses: ./.github/workflows/e2e-tests.yml
permissions:
contents: read
e2e-report:
name: End-to-End Tests
if: always()
needs: e2e-tests
runs-on: ubuntu-24.04
permissions:
contents: read
pull-requests: write
steps:
- name: Download all test results
uses: actions/download-artifact@v4
with:
pattern: test-results-*
path: all-test-results
- name: Aggregate test results
id: test-results
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
result-encoding: string
script: |
const fs = require('fs');
const path = require('path');
let totalPassed = 0;
let totalFailed = 0;
let totalFlaky = 0;
let totalSkipped = 0;
let totalDuration = 0;
let foundResults = false;
try {
const resultsDir = 'all-test-results';
const shards = fs.readdirSync(resultsDir);
for (const shard of shards) {
const resultsPath = path.join(resultsDir, shard, 'results.json');
if (fs.existsSync(resultsPath)) {
foundResults = true;
const results = JSON.parse(fs.readFileSync(resultsPath, 'utf8'));
const { stats } = results;
totalPassed += stats.expected || 0;
totalFailed += stats.unexpected || 0;
totalFlaky += stats.flaky || 0;
totalSkipped += stats.skipped || 0;
totalDuration += stats.duration || 0;
}
}
if (foundResults) {
const duration = Math.round(totalDuration / 1000);
const summary = totalFailed > 0
? `❌ **${totalFailed} test${totalFailed > 1 ? 's' : ''} failed**`
: `✅ **All tests passed**`;
return `## E2E Test Results
${summary} • ${totalPassed} passed • ${totalSkipped} skipped • ${duration}s
| Status | Count |
|--------|-------|
| ✅ Passed | ${totalPassed} |
| ❌ Failed | ${totalFailed} |
| ⚠️ Flaky | ${totalFlaky} |
| ⏭️ Skipped | ${totalSkipped} |
Tests ran across ${shards.length} shards in parallel.
[View full report →](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})`;
} else {
return `## E2E Test Results
❌ **Test results file not found**
[View full report →](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})`;
}
} catch (error) {
console.log('Could not parse test results:', error.message);
return `## E2E Test Results
❌ **Error reading test results**: ${error.message}
[View full report →](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})`;
}
- name: Comment PR with test results
uses: mshick/add-pr-comment@v2
# Skip for fork PRs: GITHUB_TOKEN cannot write to the base repo
if:
always() && github.event_name == 'pull_request' &&
github.event.pull_request.head.repo.fork != true
with:
message: ${{ steps.test-results.outputs.result }}
message-id: e2e-test-results
- name: Check test results
id: check-results
run: |
total_failed=0
for dir in all-test-results/*/; do
if [ -f "${dir}results.json" ]; then
unexpected=$(jq -r '.stats.unexpected // 0' "${dir}results.json")
total_failed=$((total_failed + unexpected))
fi
done
if [ "$total_failed" -gt 0 ]; then
echo "::error::$total_failed E2E test(s) failed"
exit 1
fi
# Fail when any shard failed even if we couldn't read failure count from artifacts
if [ "${{ needs.e2e-tests.result }}" = "failure" ]; then
echo "::error::One or more E2E test shards failed"
exit 1
fi
clickhouse-static-build:
name: ClickHouse Bundle Build
runs-on: ubuntu-24.04
timeout-minutes: 10
permissions:
contents: read
strategy:
fail-fast: false
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version-file: '.nvmrc'
cache-dependency-path: 'yarn.lock'
cache: 'yarn'
- name: Install dependencies
run: yarn install
- name: Build App
run: yarn build:clickhouse
- name: Verify output directory exists and has size
run: |
if [ ! -d "packages/app/out" ]; then
echo "::error::Output directory 'packages/app/out' does not exist"
exit 1
fi
echo "✓ Output directory exists"
# Calculate size in bytes and convert to MB
size_kb=$(du -sk packages/app/out | cut -f1)
size_mb=$((size_kb / 1024))
echo "Output directory size: ${size_mb} MB (${size_kb} KB)"
if [ "$size_mb" -lt 10 ]; then
echo "::error::Output directory is only ${size_mb} MB, expected at least 10 MB"
exit 1
fi
echo "✓ Output directory size check passed (${size_mb} MB)"