Summary
- drop the Foundry library and build artifact cache restores from the
e2e workflow
- also remove the Foundry build cache from the dedicated Foundry tests
workflow since it wasn’t providing value
Testing
- Not run (not requested)
Use the latest v0.15.2 release of Kurtosis, that includes improved
compatibility with rootless Podman (wrt. socket detection and bind
mounting) following the merge of
https://github.com/kurtosis-tech/kurtosis/pull/2803.
Up to now, the e2e CI job was using a custom (patched) version of
Kurtosis CLI, Engine & Core images.
## Summary
- Split CI workflow to stop re-running validation when PRs are merged to
main
- Create dedicated `release.yml` workflow for Docker Hub releases on
main branch
- Keep full CI validation for PRs and `perm-*` branches
## Motivation
Since the repository is configured to:
1. Require PRs to be up-to-date with main before merging
2. Require all CI checks to pass
Re-running the full CI suite (~12 jobs) on main after merge is redundant
and wastes CI runner time that could be used for other tasks.
## Changes
| Workflow | Before | After |
|----------|--------|-------|
| `CI.yml` | Triggers on push to `main`, `perm-*`, and PRs to `main` |
Triggers on push to `perm-*` and PRs to `main` only |
| `release.yml` | N/A (new) | Triggers on push to `main`, runs only
`docker-build-release` |
## Impact
| Event | Before | After | Savings |
|-------|--------|-------|---------|
| PR to main | 13 jobs | 12 jobs | 1 job |
| Merge to main | 13 jobs | 1 job | 12 jobs |
| Push to perm-* | 13 jobs | 12 jobs | 1 job |
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
Improve CI performance through better caching and simplified workflows:
- Add a new `warm-sccache` job that runs before all Rust CI jobs to
pre-populate the sccache cache
- Cache locally installed tools (`~/.local/bin`, `~/.local/lib`,
`~/.local/include`) with a version-based hash key
- Simplify Rust tests by removing matrix partitioning and switching from
cargo-nextest to cargo test
## Problem
**Poor sccache hit rates:**
- `rust-lint`, `unit-tests`, and `build-operator` ran in parallel with
cold caches
- Each job compiled dependencies independently
- Cache was only saved at job completion (too late for parallel jobs to
benefit)
**Redundant tool downloads:**
- Mold, LLVM/Clang, protoc, and libpq (~500MB+) were downloaded fresh on
each job
- No caching of locally installed tools
**Overcomplicated test setup:**
- 2-partition matrix for tests added complexity without significant
benefit
- cargo-nextest required installation step (~30s overhead)
- Separate result-checker job wasn't necessary
## Solution
### 1. sccache warm-up job (`task-warm-sccache.yml`)
- Runs first (Tier 0) before all Rust jobs
- Compiles with release mode + all features (`fast-runtime`,
`try-runtime`, `runtime-benchmarks`)
- Compiles with debug mode to cover test builds
- Uses `SKIP_WASM_BUILD=1` to minimize warm-up time
### 2. Local tools caching (`setup-env/action.yml`)
- Define tool versions as env vars (`MOLD_VERSION`, `LLVM_VERSION`,
`PROTOC_VERSION`, `LIBPQ_VERSION`)
- Generate SHA256 hash from versions for cache key
- Cache `~/.local/bin`, `~/.local/lib`, `~/.local/include` (not all of
`~/.local` to avoid container storage)
- Set up PATH and env vars immediately after cache restore
### 3. Simplified Rust tests (`task-rust-tests.yml`)
- Remove 2-partition matrix strategy
- Replace cargo-nextest with `cargo test --locked`
- Remove separate tests-result-checker job
## CI Flow
```
┌─ build-operator (warm sccache + cached tools)
│
CI Start → warm-sccache ─┼─ rust-lint (warm sccache + cached tools)
│
└─ unit-tests (warm sccache + cached tools)
```
## Test plan
- [x] CI workflow runs successfully
- [x] Warm-sccache job completes and shows cache stats
- [x] Local tools cache restores correctly (no permission errors)
- [x] Downstream Rust jobs show improved cache hit rates
- [x] Rust tests pass with simplified single-job setup
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: undercover-cactus <lola@moonsonglabs.com>
## Summary
- Fix Docker Hub rate limit errors in E2E CI job by adding
authentication
- Pass existing `DOCKERHUB_USERNAME` and `DOCKERHUB_TOKEN` secrets to
the E2E workflow
## Problem
The E2E CI job pulls `datahavenxyz/snowbridge-relay:latest` from Docker
Hub without authentication, causing rate limit errors (10 pulls/hour for
unauthenticated requests):
```
Error: initializing source docker://datahavenxyz/snowbridge-relay:latest: reading manifest latest in docker.io/datahavenxyz/snowbridge-relay: toomanyrequests: You have reached your unauthenticated pull rate limit.
```
## Solution
Reuse the Docker Hub secrets already configured for
`docker-build-release` by:
1. Passing secrets from `CI.yml` to the E2E workflow
2. Adding optional secrets declaration in `task-e2e.yml`
3. Adding Docker Hub login step before pulling `snowbridge-relay`
---------
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
## Summary
Re-add the static build feature option to bundle postgres dependency
into the binary. It simplify the installation because now to run the
node the operator doesn't need to have postgres dependencies installed
on its system.
## What changed ?
* Added a `static` feature that can be activated to add the extra
dependencies during the build.
* A task that run every time a dependency has been modified so we can
make sure the build with the feature is still working correctly. (we are
assuming simple change in the code won't have an impact on it because
postgres is being used in diesel which is not a direct dependecy to
datahaven).
## Summary
This PR improve the generating state workflow. It will also check for
outdated state-diff.json and add a practical script to easily generate a
new one.
The way we generate state has also been changed to make it work with
macOS M1 system. We don't run the tool in the container anymore but
instead directly on the machine.
## What changes
* A check-generated-state.js script was added to quickly look for
outdated test
* The check was added in the CI
* A generate-contracts.ts script was added to easily generate the new
state with the new instructions to run on MacOS
---------
Co-authored-by: Gonza Montiel <gon.montiel@gmail.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
This PR significantly refactors and improves the end-to-end testing
framework and infrastructure. The primary focus was on simplifying the
test suites, improving reliability through better resource management,
and hardening the relayer infrastructure.
All E2E tests are now passing on the CI and demonstrate consistent
reliability when run locally.
### Key Changes
#### 1. E2E Test Suite Refactor & Cleanup
* **Simplified Test Logic**: Heavily refactored the core test suites
(`native-token-transfer.test.ts`, `rewards-message.test.ts`, and
`validator-set-update.test.ts`). The new implementation is much cleaner,
utilizing shared helpers to reduce boilerplate.
* **Utility Consolidation**: Removed redundant utility files
(`storage.ts`, `rewards-helpers.ts`) and simplified `events.ts`. Event
waiting now uses `rxjs` for Substrate and native `viem` watchers for
Ethereum, which is more robust and easier to maintain.
* **Better Connector Management**: Unified the creation and cleanup of
test clients in `ConnectorFactory`. It now handles the lifecycle of
WebSocket connections more gracefully, including clearing the
`socketClientCache` to prevent reconnection noise during teardown.
#### 2. Infrastructure & Stability
* **Relayer Relaunch Policy**: Added a restart policy for Snowbridge
relayer containers. They are now configured with `--restart
on-failure:5`, ensuring that relayers automatically relaunch if they
crash during the sensitive initialization phase.
* **WebSocket Integration**:
* Updated the `ConnectorFactory` to prefer **WebSockets** for the
Ethereum public client, which is essential for efficient, event-heavy
E2E testing.
* Enhanced `launchKurtosisNetwork` to correctly identify and register
the Execution Layer's WebSocket endpoint from Kurtosis.
* **Disabled Contract Injection**: This PR temporarily disables the
automatic injection of contracts into the genesis state by default.
* *Reason*: I encountered issues generating a valid `state-diff.json`
for the latest contract versions. Even after applying several
workarounds, the injected state remained unstable. As a result, I've
reverted to manual contract deployment during the launch sequence for
better reliability for now.
#### 3. Documentation & Maintenance
* Removed obsolete documentation (`event-utilities-guide.md`) that no
longer reflects the simplified event-handling API.
* Cleaned up `test/launcher/validators.ts` and moved logic into more
appropriate helpers.
---------
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
## Summary
- Bump Rust toolchain from 1.88.0 to 1.90.0 in
`operator/rust-toolchain.toml`
- Update hardcoded Rust version in
`.github/workflows/task-check-licenses.yml` to match
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
## Summary
Fixes the CI failure introduced by #349 where reusable workflows
couldn't use the permissions they declared.
## Root Cause
When using `workflow_call` (reusable workflows), the **called workflow's
permissions are constrained by the caller**. A called workflow cannot
request more permissions than the calling workflow grants.
PR #349 added explicit permissions to individual workflows (e.g.,
`actions: write` in task-build-operator.yml), but removed them from
CI.yml. This caused failures because:
```
CI.yml (contents: read only)
└── task-build-operator.yml (requests actions: write)
└── FAILS: caller doesn't grant actions: write
```
## Fix
Grant the necessary permissions in CI.yml so called workflows can use
them:
```yaml
permissions:
contents: read
actions: write # For artifact upload/download
packages: write # For ghcr.io push
```
## Why the individual workflow permissions still matter
The explicit permissions in called workflows are still valuable for:
1. **Documentation** - Makes the intent clear
2. **Direct invocation** - Works when called via `workflow_dispatch`
3. **Defense in depth** - If CI.yml grants more than needed, called
workflows still request only what they need
Co-authored-by: Claude <noreply@anthropic.com>
In this PR we add an environment variable `INJECT_CONTRACTS`. This
environment variable specify if the contracts should be injected in the
e2e tests. By default it is false. The environment variable is set to
`true` in the CI job that run the e2e tests.
We are using a environment variable because `bun test` doesn't allow for
passing extra arguments.
A note about the new variable has been added in the documentation to
inform about the new behavior.
---------
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
## Summary
Pins Bun version to 1.3.2 and migrates workflows to use text-based
`bun.lock` instead of binary `bun.lockb`. This fixes CI failures caused
by Bun version mismatches between local development and GitHub Actions.
## Changes
- Created `test/.bun-version` to pin Bun to v1.3.2
- Updated all workflows to use `bun-version-file: test/.bun-version`
- Migrated workflow cache keys from `bun.lockb` to `bun.lock`
- Removed deprecated `test/bun.lockb` binary lockfile
## Why?
**Version Consistency:**
- Local environments and CI were using different Bun versions
- Different versions generate different lockfile formats → CI failures
**Lockfile Migration:**
- Bun v1.2+ uses text-based `bun.lock` as default
- Binary `bun.lockb` is still supported but deprioritized
- Text format provides better git diffs and merge conflict resolution
## Affected Workflows
- `.github/workflows/task-check-metadata.yml`
- `.github/workflows/task-e2e.yml`
- `.github/workflows/task-moonwall-tests.yml`
- `.github/workflows/task-ts-build.yml`
- `.github/workflows/task-ts-lint.yml`
## After Merge
Developers should upgrade their local Bun:
```bash
bun upgrade --stable # Should install v1.3.2
bun --version # Verify version
bun install # Regenerate lockfile if needed
```
---------
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
## Summary
- sync `contracts/lib/eigenlayer-contracts` to tag
`v1.8.0-testnet-final` and refresh `EIGENLAYER.md` with the new commit
reference
- update local/test deployment flows to deploy the upstream
`EigenStrategy`, feed it into `AllocationManager`/`StrategyManager`, and
adopt the revised `EigenPod` constructor
- drop the obsolete `AllocationManagerMock` stub and replace its usage
with targeted `vm.mockCall` stubs that return `slashOperator` share data
- adjust slasher unit tests to match the new ABI so DataHaven stays
aligned with EigenLayer 1.8 semantics
## Testing
- forge build
- forge test
## Summary
- Adds automated license compliance checking via GitHub Actions CI
workflow
- Implements a license verification script that validates all Rust
dependencies against approved licenses, authors, and packages
- Standardizes author metadata across Cargo manifests to "Moonsong Labs"
## Changes
**CI Workflow** (`.github/workflows/task-check-licenses.yml`)
- Triggers on pull requests and manual dispatch
- Installs Rust 1.88.0 toolchain and `cargo-license` tool
- Executes license verification script to enforce compliance
**License Verification Script** (`operator/scripts/verify-licenses.sh`)
- Uses `cargo-license` to extract dependency license information
- Maintains three allowlists:
- **Licenses**: Apache-2.0, MIT, BSD variants, GPL-3.0, MPL-2.0, and
compatible combinations
- **Authors**: PureStake, Parity Technologies, Moonsong Labs, Frontier
developers, StorageHub Team
- **Package Names**: Known safe packages like ring
- Fails the build if any dependency has unapproved license/author/name
combination
**Cargo Manifest Updates**
- `operator/Cargo.toml`: Standardized workspace author to "Moonsong
Labs"
- `operator/precompiles/precompile-registry/Cargo.toml`: Uses workspace
author field
- `operator/runtime/common/Cargo.toml`: Added workspace author field
## Benefits
- **Legal Compliance**: Ensures all dependencies use OSI-approved or
compatible licenses
- **Supply Chain Security**: Validates dependencies come from trusted
sources
- **Automated Enforcement**: Catches licensing issues during PR review
rather than at release time
- **Transparency**: Provides clear audit trail of approved licenses and
authors
This PR fixes the E2E checkout failure by fetching full history instead
of a depth-1 clone so the Snowbridge forge-std submodule can resolve its
pinned commit.
In this PR, we pin the forge version use in the linter task. When forge
make a new release it brokes the linter task.
In the future we can update the forge version explicitly.
---------
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
In this PR, we fix the ci error indicating we reached the number of
volumes allow by deleting automatically after tests the volumes.
We also remove the step that collect logs because the container that are
interesting to us to debug are being removed entirely. Therefore the
logs from the nodes are not being collected in this step.
This PR remove the `cargo chef` step used to build the docker image used
in deployment. We noticed that `cargo chef` was adding more time to the
build and that removing it was saving us 40min.
Also in this PR, we removed the base image from parity which was really
heavy and was filling the rest of the disk space. This broke the build.
After some investigation it doesn't seem to add a lot to the build. It
has been replace with the official rust image as a base to build our
node.
The image used to run the image has been replaced with
`debian:trixie-slim`.
In the end those changed **should not** break any of the current
behavior and makes save a bit of CI time.
## Summary
- Adds support for both Docker and Podman container engines in
`build-runtime-srtool.sh` via `IS_PODMAN` environment variable
- Uses `--userns=keep-id` for Podman (proper user namespace handling)
and `--user $(id -u):$(id -g)` for Docker
- Sets `IS_PODMAN=true` in `task-publish-runtime.yml` workflow to enable
Podman by default
## Changes
**`operator/scripts/build-runtime-srtool.sh`:**
- Added conditional logic to detect `IS_PODMAN` env var
- Dynamically selects between `podman` and `docker` as container engine
- Sets appropriate user/namespace flags based on container engine
**`.github/workflows/task-publish-runtime.yml`:**
- Added `IS_PODMAN: true` environment variable to the srtool build step
- Updated comment to use generic "container user" instead of "docker
user"
Co-authored-by: Claude <noreply@anthropic.com>
In this PR, we remove the caching of the sccache folder because it is
too big (~3GB) and fill our cache too fast.
What to expect ?
* It will make the build a bit slower but it is fine because it only
build on `main`. We are preparing another PR that will speed up the
build of the prod image. Also we are not sure the cache is actually
being used (`gha` cache is in beta).
* Will free some space for caching and stop deleting our cache which
make other jobs work faster.
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
### Description
This PR introduces the **Moonwall** end-to-end (E2E) testing framework.
The primary motivation for this is to enable the porting of existing
Mobeam tests into the `DataHaven` repository.
### Key Changes
* **Node Manual Sealing:**
* Introduced a `--sealing=manual` flag for the `datahaven-node`. When
enabled, blocks are only produced on demand via an RPC call. This is the
core mechanism that allows for deterministic tests.
* **Moonwall Framework Integration:**
* Added `@moonwall/cli` and `@moonwall/util` dependencies to the
`test/package.json`.
* A new `test/moonwall.config.json` file configures the test
environment, defining how Moonwall should launch the `datahaven-node`
with the manual sealing flag.
* Added a `moonwall:test` script to `package.json` for running the
tests.
* **CI Workflow:**
* A new reusable workflow, `.github/workflows/task-moonwall-tests.yml`,
has been created to handle the setup, execution, and reporting of
Moonwall tests.
* The main `CI.yml` now includes a `moonwall-tests` job that runs after
the `build-operator` job, ensuring it always tests the correct,
freshly-built binary.
* **Example Test Suite:**
* A new test suite, `test/datahaven/suites/dev/test-block.ts`, had been
copied from moonbeam.
### How to Run Locally
1. Navigate to the `test` directory.
2. Install dependencies: `bun install`
3. Run the tests: `bun run moonwall:test`
---------
Co-authored-by: undercover-cactus <lola@moonsonglabs.com>