Commit graph

98 commits

Author SHA1 Message Date
undercover-cactus
a77a55a501 incorrect build-args passed 2026-02-25 10:36:38 +01:00
undercover-cactus
292aa43930 fix path for action 2026-02-12 09:31:13 +01:00
undercover-cactus
4631a9947a move publish action from workflow template to a local action 2026-02-12 09:06:21 +01:00
undercover-cactus
b258113547 use ARG to build Dockerfile; use local chain everywhere; 2026-02-10 17:36:07 +01:00
Ahmad Kaouk
da2847bbbf
test: Add storage layout checks for upgradeable contracts (#420)
## Summary

Implements storage layout testing for the upgradeable
`DataHavenServiceManager` contract to prevent state
   corruption during proxy upgrades.

  ## Changes

  ### New Files
- **`contracts/storage-snapshots/DataHavenServiceManager.storage.json`**
- Baseline storage layout
  snapshot
- **`contracts/storage-snapshots/README.md`** - Documentation for
updating snapshots and known
  limitations
- **`contracts/scripts/check-storage-layout.sh`** - CI script that
compares current layout against
  snapshot
- **`contracts/test/storage/StorageLayout.t.sol`** - Upgrade simulation
tests verifying state
  preservation
- **`.github/workflows/task-storage-layout.yml`** - CI workflow for
storage layout checks

  ### Modified Files
- **`.github/workflows/CI.yml`** - Added `storage-layout` job to run in
parallel with other checks

  ## How It Works

  **Two-pronged approach:**

1. **Snapshot Diff** - Compares current storage layout against committed
snapshot using `forge inspect`.
Catches unintended variable reordering, type changes, or gap
modifications.

2. **Upgrade Simulation** - Foundry tests that populate state, perform a
proxy upgrade, and verify all
  values survive:
     - `test_upgradePreservesState` - Verifies core state variables
- `test_upgradePreservesValidatorMappings` - Verifies
`validatorEthAddressToSolochainAddress` mapping
- `test_upgradePreservesMultipleValidators` - Verifies
`validatorsAllowlist` with multiple entries
- `test_functionalityAfterUpgrade` - Verifies contract remains
functional post-upgrade

  ## Normalization

  The snapshot comparison normalizes JSON to avoid false positives:
  - Removes `astId` (changes with compiler runs)
  - Removes `contract` (contains full file path)
- Removes `.types` section (contains unstable AST IDs embedded in type
keys)
  - Sorts by slot number

  ## Usage

  ```bash
  # Check storage layout against snapshot
  ./scripts/check-storage-layout.sh

  # Run upgrade simulation tests
  forge test --match-contract StorageLayoutTest -vvv

  # Update snapshot (when intentionally changing storage)
  forge inspect DataHavenServiceManager storage --json >
  storage-snapshots/DataHavenServiceManager.storage.json
```
  ## Test Plan

  - ./scripts/check-storage-layout.sh passes
  - forge test --match-contract StorageLayoutTest -vvv passes (4 tests)
  - CI workflow runs successfully
2026-02-05 11:08:35 +00:00
Ahmad Kaouk
a8df6aae95
ci: Remove unused Foundry cache steps (#431)
Summary
- drop the Foundry library and build artifact cache restores from the
e2e workflow
- also remove the Foundry build cache from the dedicated Foundry tests
workflow since it wasn’t providing value

Testing
- Not run (not requested)
2026-02-05 11:21:02 +01:00
Steve Degosserie
abd8366d87
chore: ♻ Use latest Kurtosis release v1.15.2 (#415)
Use the latest v0.15.2 release of Kurtosis, that includes improved
compatibility with rootless Podman (wrt. socket detection and bind
mounting) following the merge of
https://github.com/kurtosis-tech/kurtosis/pull/2803.
Up to now, the e2e CI job was using a custom (patched) version of
Kurtosis CLI, Engine & Core images.
2026-01-27 13:12:55 +01:00
Steve Degosserie
9de44b84fe
revert: ♻ Revert Rust toolchain to 1.88.0 (revert PR #362) (#392)
Revert #362, back to Rust toolchain v1.88.0, as the newer version causes
an issue in the runtime release publishing flow.
2026-01-14 08:37:27 +01:00
Steve Degosserie
f3f53dc9ee
fix: 🔨 Add missing 'packages: write' permission for docker-build-release job (#387) 2026-01-09 16:34:44 +01:00
Steve Degosserie
2557a192c2
ci: Disable redundant CI on main branch merges (#386)
## Summary

- Split CI workflow to stop re-running validation when PRs are merged to
main
- Create dedicated `release.yml` workflow for Docker Hub releases on
main branch
- Keep full CI validation for PRs and `perm-*` branches

## Motivation

Since the repository is configured to:
1. Require PRs to be up-to-date with main before merging
2. Require all CI checks to pass

Re-running the full CI suite (~12 jobs) on main after merge is redundant
and wastes CI runner time that could be used for other tasks.

## Changes

| Workflow | Before | After |
|----------|--------|-------|
| `CI.yml` | Triggers on push to `main`, `perm-*`, and PRs to `main` |
Triggers on push to `perm-*` and PRs to `main` only |
| `release.yml` | N/A (new) | Triggers on push to `main`, runs only
`docker-build-release` |

## Impact

| Event | Before | After | Savings |
|-------|--------|-------|---------|
| PR to main | 13 jobs | 12 jobs | 1 job |
| Merge to main | 13 jobs | 1 job | 12 jobs |
| Push to perm-* | 13 jobs | 12 jobs | 1 job |

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 16:29:59 +01:00
Steve Degosserie
e04023ef11
ci: Add sccache warm-up job for better cache hit rates (#375)
## Summary

Improve CI performance through better caching and simplified workflows:

- Add a new `warm-sccache` job that runs before all Rust CI jobs to
pre-populate the sccache cache
- Cache locally installed tools (`~/.local/bin`, `~/.local/lib`,
`~/.local/include`) with a version-based hash key
- Simplify Rust tests by removing matrix partitioning and switching from
cargo-nextest to cargo test

## Problem

**Poor sccache hit rates:**
- `rust-lint`, `unit-tests`, and `build-operator` ran in parallel with
cold caches
- Each job compiled dependencies independently
- Cache was only saved at job completion (too late for parallel jobs to
benefit)

**Redundant tool downloads:**
- Mold, LLVM/Clang, protoc, and libpq (~500MB+) were downloaded fresh on
each job
- No caching of locally installed tools

**Overcomplicated test setup:**
- 2-partition matrix for tests added complexity without significant
benefit
- cargo-nextest required installation step (~30s overhead)
- Separate result-checker job wasn't necessary

## Solution

### 1. sccache warm-up job (`task-warm-sccache.yml`)
- Runs first (Tier 0) before all Rust jobs
- Compiles with release mode + all features (`fast-runtime`,
`try-runtime`, `runtime-benchmarks`)
- Compiles with debug mode to cover test builds
- Uses `SKIP_WASM_BUILD=1` to minimize warm-up time

### 2. Local tools caching (`setup-env/action.yml`)
- Define tool versions as env vars (`MOLD_VERSION`, `LLVM_VERSION`,
`PROTOC_VERSION`, `LIBPQ_VERSION`)
- Generate SHA256 hash from versions for cache key
- Cache `~/.local/bin`, `~/.local/lib`, `~/.local/include` (not all of
`~/.local` to avoid container storage)
- Set up PATH and env vars immediately after cache restore

### 3. Simplified Rust tests (`task-rust-tests.yml`)
- Remove 2-partition matrix strategy
- Replace cargo-nextest with `cargo test --locked`
- Remove separate tests-result-checker job

## CI Flow

```
                      ┌─ build-operator (warm sccache + cached tools)
                      │
 CI Start → warm-sccache ─┼─ rust-lint (warm sccache + cached tools)
                      │
                      └─ unit-tests (warm sccache + cached tools)
```

## Test plan

- [x] CI workflow runs successfully
- [x] Warm-sccache job completes and shows cache stats
- [x] Local tools cache restores correctly (no permission errors)
- [x] Downstream Rust jobs show improved cache hit rates
- [x] Rust tests pass with simplified single-job setup

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: undercover-cactus <lola@moonsonglabs.com>
2026-01-08 13:44:45 +01:00
Ahmad Kaouk
aee282613f
ci: add Docker Hub authentication to E2E workflow (#380)
## Summary
- Fix Docker Hub rate limit errors in E2E CI job by adding
authentication
- Pass existing `DOCKERHUB_USERNAME` and `DOCKERHUB_TOKEN` secrets to
the E2E workflow

  ## Problem
The E2E CI job pulls `datahavenxyz/snowbridge-relay:latest` from Docker
Hub without authentication, causing rate limit errors (10 pulls/hour for
unauthenticated requests):
```
Error: initializing source docker://datahavenxyz/snowbridge-relay:latest: reading manifest latest in docker.io/datahavenxyz/snowbridge-relay: toomanyrequests: You have reached your unauthenticated pull rate limit. 
```
  ## Solution
Reuse the Docker Hub secrets already configured for
`docker-build-release` by:
  1. Passing secrets from `CI.yml` to the E2E workflow
  2. Adding optional secrets declaration in `task-e2e.yml`
  3. Adding Docker Hub login step before pulling `snowbridge-relay`

---------

Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
2026-01-07 13:06:33 +01:00
undercover-cactus
a8d811fde8
feat: add feature to build binary with postgres bundled (#346)
## Summary

Re-add the static build feature option to bundle postgres dependency
into the binary. It simplify the installation because now to run the
node the operator doesn't need to have postgres dependencies installed
on its system.

## What changed ?

* Added a `static` feature that can be activated to add the extra
dependencies during the build.
* A task that run every time a dependency has been modified so we can
make sure the build with the feature is still working correctly. (we are
assuming simple change in the code won't have an impact on it because
postgres is being used in diesel which is not a direct dependecy to
datahaven).
2026-01-06 13:13:25 +00:00
undercover-cactus
42ec577f15
test : improve contract injection (#326)
## Summary

This PR improve the generating state workflow. It will also check for
outdated state-diff.json and add a practical script to easily generate a
new one.

The way we generate state has also been changed to make it work with
macOS M1 system. We don't run the tool in the container anymore but
instead directly on the machine.

## What changes

* A check-generated-state.js script was added to quickly look for
outdated test
* The check was added in the CI
* A generate-contracts.ts script was added to easily generate the new
state with the new instructions to run on MacOS

---------

Co-authored-by: Gonza Montiel <gon.montiel@gmail.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
2026-01-06 11:27:50 +00:00
Ahmad Kaouk
41788d56bb
test: refactor e2e tests (#365)
This PR significantly refactors and improves the end-to-end testing
framework and infrastructure. The primary focus was on simplifying the
test suites, improving reliability through better resource management,
and hardening the relayer infrastructure.

All E2E tests are now passing on the CI and demonstrate consistent
reliability when run locally.

### Key Changes

#### 1. E2E Test Suite Refactor & Cleanup
* **Simplified Test Logic**: Heavily refactored the core test suites
(`native-token-transfer.test.ts`, `rewards-message.test.ts`, and
`validator-set-update.test.ts`). The new implementation is much cleaner,
utilizing shared helpers to reduce boilerplate.
* **Utility Consolidation**: Removed redundant utility files
(`storage.ts`, `rewards-helpers.ts`) and simplified `events.ts`. Event
waiting now uses `rxjs` for Substrate and native `viem` watchers for
Ethereum, which is more robust and easier to maintain.
* **Better Connector Management**: Unified the creation and cleanup of
test clients in `ConnectorFactory`. It now handles the lifecycle of
WebSocket connections more gracefully, including clearing the
`socketClientCache` to prevent reconnection noise during teardown.

#### 2. Infrastructure & Stability
* **Relayer Relaunch Policy**: Added a restart policy for Snowbridge
relayer containers. They are now configured with `--restart
on-failure:5`, ensuring that relayers automatically relaunch if they
crash during the sensitive initialization phase.
*   **WebSocket Integration**: 
* Updated the `ConnectorFactory` to prefer **WebSockets** for the
Ethereum public client, which is essential for efficient, event-heavy
E2E testing.
* Enhanced `launchKurtosisNetwork` to correctly identify and register
the Execution Layer's WebSocket endpoint from Kurtosis.
* **Disabled Contract Injection**: This PR temporarily disables the
automatic injection of contracts into the genesis state by default.
* *Reason*: I encountered issues generating a valid `state-diff.json`
for the latest contract versions. Even after applying several
workarounds, the injected state remained unstable. As a result, I've
reverted to manual contract deployment during the launch sequence for
better reliability for now.

#### 3. Documentation & Maintenance
* Removed obsolete documentation (`event-utilities-guide.md`) that no
longer reflects the simplified event-handling API.
* Cleaned up `test/launcher/validators.ts` and moved logic into more
appropriate helpers.

---------

Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
2025-12-24 13:31:40 +01:00
Steve Degosserie
58296f5e87
build: ⬆️ Bump Rust version to 1.90.0 (#362)
## Summary
- Bump Rust toolchain from 1.88.0 to 1.90.0 in
`operator/rust-toolchain.toml`
- Update hardcoded Rust version in
`.github/workflows/task-check-licenses.yml` to match

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
2025-12-19 08:43:01 +01:00
Steve Degosserie
9fff64020f
fix: Grant required permissions to reusable workflows in CI.yml (#355)
## Summary

Fixes the CI failure introduced by #349 where reusable workflows
couldn't use the permissions they declared.

## Root Cause

When using `workflow_call` (reusable workflows), the **called workflow's
permissions are constrained by the caller**. A called workflow cannot
request more permissions than the calling workflow grants.

PR #349 added explicit permissions to individual workflows (e.g.,
`actions: write` in task-build-operator.yml), but removed them from
CI.yml. This caused failures because:

```
CI.yml (contents: read only)
    └── task-build-operator.yml (requests actions: write)
        └── FAILS: caller doesn't grant actions: write
```

## Fix

Grant the necessary permissions in CI.yml so called workflows can use
them:

```yaml
permissions:
  contents: read
  actions: write    # For artifact upload/download
  packages: write   # For ghcr.io push
```

## Why the individual workflow permissions still matter

The explicit permissions in called workflows are still valuable for:
1. **Documentation** - Makes the intent clear
2. **Direct invocation** - Works when called via `workflow_dispatch`
3. **Defense in depth** - If CI.yml grants more than needed, called
workflows still request only what they need

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-12 11:33:00 +00:00
Steve Degosserie
746fce9328
security: 🛡️ Harden GitHub Actions workflows (#349)
## Summary

This PR addresses several security vulnerabilities and applies hardening
measures to the GitHub Actions workflows:

- **Replace `secrets: inherit` with explicit secret passing** - Prevents
unnecessary exposure of all repository secrets to called workflows
- **Add SHA256 checksum verification for downloaded binaries** -
Protects against supply chain attacks via compromised upstream releases
- **Add GitHub Environment protections for release workflows** -
Requires approval before publishing to Docker Hub or creating releases
- **Add explicit minimal permissions to all workflows** - Follows
principle of least privilege, removes unnecessary `packages: write` from
CI.yml

## Changes by Category

### 1. Explicit Secret Passing
| Workflow | Before | After |
|----------|--------|-------|
| CI.yml → docker-build-ci | `secrets: inherit` | No secrets
(GITHUB_TOKEN is automatic) |
| CI.yml → docker-build-release | `secrets: inherit` | Explicit
`DOCKERHUB_USERNAME`, `DOCKERHUB_TOKEN` |
| CI.yml → e2e-tests | `secrets: inherit` | No secrets (GITHUB_TOKEN is
automatic) |

### 2. Binary Checksum Verification
| Workflow | Binary | SHA256 |
|----------|--------|--------|
| task-rust-lint.yml | taplo 0.8.1 | `c62baa73c9d7c1572...` |
| task-e2e.yml | kurtosis 1.11.99 | `5e88e98c1b255362...` |

### 3. Environment Protections
| Workflow | Job | Environment |
|----------|-----|-------------|
| task-docker-release.yml | build-test-push | `production` |
| task-publish-binary.yml | publish-draft-release | `releases` |
| task-publish-binary.yml | docker-release-candidate | `production` |
| task-publish-runtime.yml | publish-draft-release | `releases` |

### 4. Explicit Permissions
All 14 workflow files now have explicit `permissions:` blocks with
minimal required access.

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
2025-12-12 09:52:50 +00:00
Steve Degosserie
51ffcae5f0
Revert "feat: statically build binary (#292)" (#330)
This reverts commit f84b6debb7.
2025-12-02 15:42:43 +01:00
Steve Degosserie
e38843455b
fix: 🔨 Disable static binary build for now (#328) 2025-12-02 14:53:05 +01:00
undercover-cactus
f84b6debb7
feat: statically build binary (#292)
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
2025-11-28 13:38:05 +00:00
Steve Degosserie
71b5e5185f
fix: consolidate session timing and simplify docker release workflow (#321)
## Summary

- Consolidates `SessionsPerEra` definition in common runtime (removes
duplicate definitions)
- Simplifies docker release workflow to always use full Docker builds
- Removes binary reuse path from release workflow

## Changes

### Runtime Configuration
- Remove duplicate `SessionsPerEra` definitions from individual runtimes
- Import `SessionsPerEra` from `datahaven_runtime_common::time` instead
- This fixes inconsistency where individual runtimes had
`prod_or_fast!(6, 1)` while common had `prod_or_fast!(6, 3)`

### Docker Release Workflow
- Remove binary reuse path - now always does full Docker build
- Remove `binary-hash` input from `workflow_call`
- Consolidate to single build step using `datahaven-build.Dockerfile`
- `docker-build-release` now runs in parallel on main branch (no
dependency on `build-operator`)

## Timing Configuration

### Production Runtime
| Parameter        | Value       | Duration   |
|------------------|-------------|------------|
| Session          | 600 blocks  | 1 hour     |
| Sessions per era | 6           | -          |
| Era              | 6 sessions  | 6 hours    |
| Bonding duration | 28 eras     | 7 days     |

### Fast Runtime (for testing)
| Parameter        | Value       | Duration   |
|------------------|-------------|------------|
| Session          | 10 blocks   | 1 minute   |
| Sessions per era | 1           | -          |
| Era              | 1 session   | 1 minute   |
| Bonding duration | 3 eras      | 3 minutes  |

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-26 10:25:24 +01:00
undercover-cactus
53d209bbae
test: only inject contracts in e2e tests if INJECT_CONTRACTS env is 'true' (#315)
In this PR we add an environment variable `INJECT_CONTRACTS`. This
environment variable specify if the contracts should be injected in the
e2e tests. By default it is false. The environment variable is set to
`true` in the CI job that run the e2e tests.

We are using a environment variable because `bun test` doesn't allow for
passing extra arguments.

A note about the new variable has been added in the documentation to
inform about the new behavior.

---------

Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
2025-11-24 12:07:36 +01:00
Steve Degosserie
ba1cc63cb0
fix: 🔨 Run publish binary task on ephemeral runner (#307)
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
2025-11-22 15:54:49 +01:00
Steve Degosserie
37a4ba990f
fix: 🪳 Quick CI build fix (#300) 2025-11-15 12:25:15 +01:00
Ahmad Kaouk
dd7b72ca29
chore: pin Bun version and migrate to bun.lock (#290)
## Summary

Pins Bun version to 1.3.2 and migrates workflows to use text-based
`bun.lock` instead of binary `bun.lockb`. This fixes CI failures caused
by Bun version mismatches between local development and GitHub Actions.

## Changes

- Created `test/.bun-version` to pin Bun to v1.3.2
- Updated all workflows to use `bun-version-file: test/.bun-version`
- Migrated workflow cache keys from `bun.lockb` to `bun.lock`
- Removed deprecated `test/bun.lockb` binary lockfile

## Why?

**Version Consistency:**
- Local environments and CI were using different Bun versions
- Different versions generate different lockfile formats → CI failures

**Lockfile Migration:**
- Bun v1.2+ uses text-based `bun.lock` as default
- Binary `bun.lockb` is still supported but deprioritized
- Text format provides better git diffs and merge conflict resolution

## Affected Workflows

- `.github/workflows/task-check-metadata.yml`
- `.github/workflows/task-e2e.yml`
- `.github/workflows/task-moonwall-tests.yml`
- `.github/workflows/task-ts-build.yml`
- `.github/workflows/task-ts-lint.yml`

## After Merge

Developers should upgrade their local Bun:
```bash
bun upgrade --stable  # Should install v1.3.2
bun --version         # Verify version
bun install           # Regenerate lockfile if needed
```

---------

Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
2025-11-10 22:37:39 +01:00
Steve Degosserie
8fa8c18dfd
CI: ♻️ Trigger CI actions on perm-* branches (#284) 2025-11-07 13:25:53 +01:00
Ahmad Kaouk
470f5fc916
feat: update eigenlayer contracts to v1.8.0 (#270)
## Summary
- sync `contracts/lib/eigenlayer-contracts` to tag
`v1.8.0-testnet-final` and refresh `EIGENLAYER.md` with the new commit
reference
- update local/test deployment flows to deploy the upstream
`EigenStrategy`, feed it into `AllocationManager`/`StrategyManager`, and
adopt the revised `EigenPod` constructor
- drop the obsolete `AllocationManagerMock` stub and replace its usage
with targeted `vm.mockCall` stubs that return `slashOperator` share data
- adjust slasher unit tests to match the new ABI so DataHaven stays
aligned with EigenLayer 1.8 semantics

## Testing
- forge build
- forge test
2025-11-04 16:30:18 +01:00
Steve Degosserie
10a7805648
feat: Add CI license check (#269)
## Summary

- Adds automated license compliance checking via GitHub Actions CI
workflow
- Implements a license verification script that validates all Rust
dependencies against approved licenses, authors, and packages
- Standardizes author metadata across Cargo manifests to "Moonsong Labs"

## Changes

**CI Workflow** (`.github/workflows/task-check-licenses.yml`)
- Triggers on pull requests and manual dispatch
- Installs Rust 1.88.0 toolchain and `cargo-license` tool
- Executes license verification script to enforce compliance

**License Verification Script** (`operator/scripts/verify-licenses.sh`)
- Uses `cargo-license` to extract dependency license information
- Maintains three allowlists:
- **Licenses**: Apache-2.0, MIT, BSD variants, GPL-3.0, MPL-2.0, and
compatible combinations
- **Authors**: PureStake, Parity Technologies, Moonsong Labs, Frontier
developers, StorageHub Team
  - **Package Names**: Known safe packages like ring
- Fails the build if any dependency has unapproved license/author/name
combination

**Cargo Manifest Updates**
- `operator/Cargo.toml`: Standardized workspace author to "Moonsong
Labs"
- `operator/precompiles/precompile-registry/Cargo.toml`: Uses workspace
author field
- `operator/runtime/common/Cargo.toml`: Added workspace author field

## Benefits

- **Legal Compliance**: Ensures all dependencies use OSI-approved or
compatible licenses
- **Supply Chain Security**: Validates dependencies come from trusted
sources
- **Automated Enforcement**: Catches licensing issues during PR review
rather than at release time
- **Transparency**: Provides clear audit trail of approved licenses and
authors
2025-11-02 23:32:59 +02:00
Steve Degosserie
62a4a1fb60
fix: 🔧 Fix e2e test workflow (#260) 2025-10-28 17:43:29 +01:00
Steve Degosserie
b5bc2de11e
fix: 🔧 Fix incorrect args in release Docker image publishing workflow (#256) 2025-10-27 15:13:14 +02:00
Ahmad Kaouk
48f8add3c4
ci: fix fetch submodule (#248)
This PR fixes the E2E checkout failure by fetching full history instead
of a depth-1 clone so the Snowbridge forge-std submodule can resolve its
pinned commit.
2025-10-24 13:31:17 +03:00
undercover-cactus
4eca467514
ci: pin forge version when installing it (#243)
In this PR, we pin the forge version use in the linter task. When forge
make a new release it brokes the linter task.

In the future we can update  the forge version explicitly.

---------

Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
2025-10-23 14:33:27 +00:00
undercover-cactus
908a2a7ed5
ci: remove unused volumes after e2e tests and remove logs collection (#246)
In this PR, we fix the ci error indicating we reached the number of
volumes allow by deleting automatically after tests the volumes.

We also remove the step that collect logs because the container that are
interesting to us to debug are being removed entirely. Therefore the
logs from the nodes are not being collected in this step.
2025-10-23 14:02:36 +00:00
undercover-cactus
eced179b09
misc: simplify Dockerfile to speed up build (#216)
This PR remove the `cargo chef` step used to build the docker image used
in deployment. We noticed that `cargo chef` was adding more time to the
build and that removing it was saving us 40min.

Also in this PR, we removed the base image from parity which was really
heavy and was filling the rest of the disk space. This broke the build.
After some investigation it doesn't seem to add a lot to the build. It
has been replace with the official rust image as a base to build our
node.

The image used to run the image has been replaced with
`debian:trixie-slim`.

In the end those changed **should not** break any of the current
behavior and makes save a bit of CI time.
2025-10-22 13:36:30 +02:00
Steve Degosserie
72cac823af
fix: 🔧 Fix invalid condition on workflow_call in Docker release task (#238) 2025-10-15 21:33:54 +02:00
Steve Degosserie
d202869438
fix: 🔧 Fix Docker release extract tag logic (#237) 2025-10-15 19:50:59 +02:00
Steve Degosserie
ff694b0055
fix: 🔧 Fix Docker release extract tag logic (#236) 2025-10-15 18:52:27 +02:00
Steve Degosserie
8be3c0f979
fix: 🔧 Fix Docker image tags in Docker release task (#235)
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-15 18:30:37 +02:00
Steve Degosserie
dda9111ee6
fix: 🔧 Copy all shared libraries required by the DataHaven node in build & production images (#234) 2025-10-15 14:46:07 +02:00
Steve Degosserie
9a5404de82
refactor: Consolidate and optimize Docker image architecture (#233)
## Overview

This PR consolidates and optimizes the Docker build system, reducing
redundancy and improving CI/CD performance. The changes eliminate
duplicate Dockerfiles, introduce a flexible build template, and optimize
release builds to reuse CI artifacts.

## Changes Summary

### 🐳 Docker Images Restructured

**Before:** 5 Dockerfiles with significant overlap
**After:** 4 focused images + 1 utility

#### Final Structure:

1. **`operator/Dockerfile`**  Updated
   - **Standard operator image** for CI and release builds
   - Minimal node image (accepts pre-built binaries)
   - GHCR: `ghcr.io/datahaven-xyz/datahaven/datahaven` (CI)
   - DockerHub: `datahavenxyz/datahaven` (releases)

2. **`docker/datahaven-build.Dockerfile`** (moved from
`operator/Dockerfile`)
   - Full source-to-binary build for manual releases
   - DockerHub: `datahavenxyz/datahaven:{label}`
   - Supports custom RUSTFLAGS and fast-runtime feature
   - Only used for manual workflow_dispatch builds

3. **`docker/datahaven-production.Dockerfile`** (kept)
   - Binary builder for CPU-specific releases
   - Used by build-prod-binary workflow template
   - Supports custom target-cpu flags

4. **`docker/datahaven-dev.Dockerfile`**  NEW (local dev only)
   - **FOR LOCAL DEVELOPMENT/TROUBLESHOOTING ONLY**
   - Includes debug tools: gdb, strace, vim, sudo
   - Extra dependencies: librocksdb-dev, curl
   - RUST_BACKTRACE enabled by default
   - **DO NOT USE for CI or production builds**

5. **`test/docker/crossbuild-mac-libpq.dockerfile`** (kept)
   - Utility for macOS → Linux cross-compilation

#### Removed (Redundant):
-  `docker/datahaven.Dockerfile` → replaced by operator/Dockerfile
-  `test/docker/datahaven-node-local.dockerfile` → replaced by
datahaven-dev.Dockerfile

---

### 🔄 Workflow Improvements

#### Enhanced `publish-docker` Template
- Supports both GHCR and DockerHub registries
- Flexible inputs: dockerfile, context, build-args, cache scope
- Auto-generates OCI-compliant labels
- Reduces code duplication (~70 lines → ~15 per workflow)

#### Refactored CI Pipeline
- **`docker-build-ci`**: Builds `operator/Dockerfile` → GHCR for CI/E2E
testing
- **`docker-build-release`**: Builds `operator/Dockerfile` → DockerHub
(main branch only)
- Both CI and release workflows now use the same minimal operator image
- Release builds **reuse CI binaries** instead of rebuilding from source

#### Optimized Release Workflow
The `task-docker-release` workflow now has dual modes:

**Mode 1: `workflow_call` (CI - main pushes)**
-  Reuses binary from CI's build-operator task
-  Uses lightweight `operator/Dockerfile`
-  Tags: `latest`, `sha-{short}`
-  **Fast**: ~5 minutes (vs ~30 min previously)

**Mode 2: `workflow_dispatch` (Manual)**
-  Full source build with `datahaven-build.Dockerfile`
-  Custom branch and label selection
-  Optional fast-runtime feature
-  Tags: `PROD-{label}` or user-defined

---

### 🔧 Additional Optimizations

- Copy libpq5 from builder stage instead of reinstalling (smaller,
faster)
- Remove redundant protobuf-compiler package (use protoc v21.12
directly)
- Standardize user UID to 1000 across all runtime images
- Consistent OCI labeling and metadata

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-15 01:33:20 +02:00
Steve Degosserie
750e8f391c
fix: 🔧 Fix Docker production image (#230) 2025-10-13 17:53:34 +02:00
Steve Degosserie
678a8fb161
fix: 🔧 Use standard Github runners for the publish runtime task (#225) 2025-10-11 10:37:19 +02:00
Steve Degosserie
8c950af4a4
fix: 🔧 Add Podman support to srtool runtime build script (#222)
## Summary

- Adds support for both Docker and Podman container engines in
`build-runtime-srtool.sh` via `IS_PODMAN` environment variable
- Uses `--userns=keep-id` for Podman (proper user namespace handling)
and `--user $(id -u):$(id -g)` for Docker
- Sets `IS_PODMAN=true` in `task-publish-runtime.yml` workflow to enable
Podman by default

## Changes

**`operator/scripts/build-runtime-srtool.sh`:**
- Added conditional logic to detect `IS_PODMAN` env var
- Dynamically selects between `podman` and `docker` as container engine
- Sets appropriate user/namespace flags based on container engine

**`.github/workflows/task-publish-runtime.yml`:**
- Added `IS_PODMAN: true` environment variable to the srtool build step
- Updated comment to use generic "container user" instead of "docker
user"

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-10 22:57:34 +02:00
undercover-cactus
514a16ac1f
ci: remove sccache from image build for prod (#200)
In this PR, we remove the caching of the sccache folder because it is
too big (~3GB) and fill our cache too fast.

What to expect ?  
* It will make the build a bit slower but it is fine because it only
build on `main`. We are preparing another PR that will speed up the
build of the prod image. Also we are not sure the cache is actually
being used (`gha` cache is in beta).
* Will free some space for caching and stop deleting our cache which
make other jobs work faster.

Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
2025-10-09 12:33:35 +00:00
Steve Degosserie
72bfe9bb62
fix: 🔧 Add revision number to rust toolchain channel version to match with srtool image version (#209) 2025-10-07 11:34:21 +02:00
Steve Degosserie
0110a94978
fix: 🔧 Fix invalid runs-on label in Publish runtime task (#207) 2025-10-06 15:38:29 +02:00
Ahmad Kaouk
17c706dc64
test: Integrate moonwall (#185)
### Description

This PR introduces the **Moonwall** end-to-end (E2E) testing framework.
The primary motivation for this is to enable the porting of existing
Mobeam tests into the `DataHaven` repository.

### Key Changes

*   **Node Manual Sealing:**
* Introduced a `--sealing=manual` flag for the `datahaven-node`. When
enabled, blocks are only produced on demand via an RPC call. This is the
core mechanism that allows for deterministic tests.

*   **Moonwall Framework Integration:**
* Added `@moonwall/cli` and `@moonwall/util` dependencies to the
`test/package.json`.
* A new `test/moonwall.config.json` file configures the test
environment, defining how Moonwall should launch the `datahaven-node`
with the manual sealing flag.
* Added a `moonwall:test` script to `package.json` for running the
tests.

*   **CI Workflow:**
* A new reusable workflow, `.github/workflows/task-moonwall-tests.yml`,
has been created to handle the setup, execution, and reporting of
Moonwall tests.
* The main `CI.yml` now includes a `moonwall-tests` job that runs after
the `build-operator` job, ensuring it always tests the correct,
freshly-built binary.

*   **Example Test Suite:**
* A new test suite, `test/datahaven/suites/dev/test-block.ts`, had been
copied from moonbeam.

### How to Run Locally

1.  Navigate to the `test` directory.
2.  Install dependencies: `bun install`
3.  Run the tests: `bun run moonwall:test`

---------

Co-authored-by: undercover-cactus <lola@moonsonglabs.com>
2025-09-30 14:47:39 +00:00
Steve Degosserie
066a416349
feat: Publish runtime GitHub action (#198) 2025-09-30 15:24:35 +02:00
Steve Degosserie
a62319961c
feat: Publish runtime GitHub action (#197) 2025-09-30 15:11:54 +02:00