Summary
- drop the Foundry library and build artifact cache restores from the
e2e workflow
- also remove the Foundry build cache from the dedicated Foundry tests
workflow since it wasn’t providing value
Testing
- Not run (not requested)
Use the latest v0.15.2 release of Kurtosis, that includes improved
compatibility with rootless Podman (wrt. socket detection and bind
mounting) following the merge of
https://github.com/kurtosis-tech/kurtosis/pull/2803.
Up to now, the e2e CI job was using a custom (patched) version of
Kurtosis CLI, Engine & Core images.
## Summary
- Fix Docker Hub rate limit errors in E2E CI job by adding
authentication
- Pass existing `DOCKERHUB_USERNAME` and `DOCKERHUB_TOKEN` secrets to
the E2E workflow
## Problem
The E2E CI job pulls `datahavenxyz/snowbridge-relay:latest` from Docker
Hub without authentication, causing rate limit errors (10 pulls/hour for
unauthenticated requests):
```
Error: initializing source docker://datahavenxyz/snowbridge-relay:latest: reading manifest latest in docker.io/datahavenxyz/snowbridge-relay: toomanyrequests: You have reached your unauthenticated pull rate limit.
```
## Solution
Reuse the Docker Hub secrets already configured for
`docker-build-release` by:
1. Passing secrets from `CI.yml` to the E2E workflow
2. Adding optional secrets declaration in `task-e2e.yml`
3. Adding Docker Hub login step before pulling `snowbridge-relay`
---------
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
## Summary
This PR improve the generating state workflow. It will also check for
outdated state-diff.json and add a practical script to easily generate a
new one.
The way we generate state has also been changed to make it work with
macOS M1 system. We don't run the tool in the container anymore but
instead directly on the machine.
## What changes
* A check-generated-state.js script was added to quickly look for
outdated test
* The check was added in the CI
* A generate-contracts.ts script was added to easily generate the new
state with the new instructions to run on MacOS
---------
Co-authored-by: Gonza Montiel <gon.montiel@gmail.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
This PR significantly refactors and improves the end-to-end testing
framework and infrastructure. The primary focus was on simplifying the
test suites, improving reliability through better resource management,
and hardening the relayer infrastructure.
All E2E tests are now passing on the CI and demonstrate consistent
reliability when run locally.
### Key Changes
#### 1. E2E Test Suite Refactor & Cleanup
* **Simplified Test Logic**: Heavily refactored the core test suites
(`native-token-transfer.test.ts`, `rewards-message.test.ts`, and
`validator-set-update.test.ts`). The new implementation is much cleaner,
utilizing shared helpers to reduce boilerplate.
* **Utility Consolidation**: Removed redundant utility files
(`storage.ts`, `rewards-helpers.ts`) and simplified `events.ts`. Event
waiting now uses `rxjs` for Substrate and native `viem` watchers for
Ethereum, which is more robust and easier to maintain.
* **Better Connector Management**: Unified the creation and cleanup of
test clients in `ConnectorFactory`. It now handles the lifecycle of
WebSocket connections more gracefully, including clearing the
`socketClientCache` to prevent reconnection noise during teardown.
#### 2. Infrastructure & Stability
* **Relayer Relaunch Policy**: Added a restart policy for Snowbridge
relayer containers. They are now configured with `--restart
on-failure:5`, ensuring that relayers automatically relaunch if they
crash during the sensitive initialization phase.
* **WebSocket Integration**:
* Updated the `ConnectorFactory` to prefer **WebSockets** for the
Ethereum public client, which is essential for efficient, event-heavy
E2E testing.
* Enhanced `launchKurtosisNetwork` to correctly identify and register
the Execution Layer's WebSocket endpoint from Kurtosis.
* **Disabled Contract Injection**: This PR temporarily disables the
automatic injection of contracts into the genesis state by default.
* *Reason*: I encountered issues generating a valid `state-diff.json`
for the latest contract versions. Even after applying several
workarounds, the injected state remained unstable. As a result, I've
reverted to manual contract deployment during the launch sequence for
better reliability for now.
#### 3. Documentation & Maintenance
* Removed obsolete documentation (`event-utilities-guide.md`) that no
longer reflects the simplified event-handling API.
* Cleaned up `test/launcher/validators.ts` and moved logic into more
appropriate helpers.
---------
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
In this PR we add an environment variable `INJECT_CONTRACTS`. This
environment variable specify if the contracts should be injected in the
e2e tests. By default it is false. The environment variable is set to
`true` in the CI job that run the e2e tests.
We are using a environment variable because `bun test` doesn't allow for
passing extra arguments.
A note about the new variable has been added in the documentation to
inform about the new behavior.
---------
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
## Summary
Pins Bun version to 1.3.2 and migrates workflows to use text-based
`bun.lock` instead of binary `bun.lockb`. This fixes CI failures caused
by Bun version mismatches between local development and GitHub Actions.
## Changes
- Created `test/.bun-version` to pin Bun to v1.3.2
- Updated all workflows to use `bun-version-file: test/.bun-version`
- Migrated workflow cache keys from `bun.lockb` to `bun.lock`
- Removed deprecated `test/bun.lockb` binary lockfile
## Why?
**Version Consistency:**
- Local environments and CI were using different Bun versions
- Different versions generate different lockfile formats → CI failures
**Lockfile Migration:**
- Bun v1.2+ uses text-based `bun.lock` as default
- Binary `bun.lockb` is still supported but deprioritized
- Text format provides better git diffs and merge conflict resolution
## Affected Workflows
- `.github/workflows/task-check-metadata.yml`
- `.github/workflows/task-e2e.yml`
- `.github/workflows/task-moonwall-tests.yml`
- `.github/workflows/task-ts-build.yml`
- `.github/workflows/task-ts-lint.yml`
## After Merge
Developers should upgrade their local Bun:
```bash
bun upgrade --stable # Should install v1.3.2
bun --version # Verify version
bun install # Regenerate lockfile if needed
```
---------
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
This PR fixes the E2E checkout failure by fetching full history instead
of a depth-1 clone so the Snowbridge forge-std submodule can resolve its
pinned commit.
In this PR, we fix the ci error indicating we reached the number of
volumes allow by deleting automatically after tests the volumes.
We also remove the step that collect logs because the container that are
interesting to us to debug are being removed entirely. Therefore the
logs from the nodes are not being collected in this step.
## Summary
Each E2E run pulls `datahavenxyz/snowbridge-relay:latest` via `docker
create`. Because the image declares an anonymous `VOLUME`, Docker
allocates a new named volume for every run. We were removing the temp
container without `-v`, so the volume stayed on disk. After ~2,048 runs,
Docker refused to allocate new locks and aborted with exit code 125:
<img width="1425" height="86" alt="2025-09-16_18-01-06"
src="https://github.com/user-attachments/assets/ca05ac54-512d-4fa9-871c-e0b259071019"
/>
## Fix
- update `.github/workflows/task-e2e.yml` to use `docker rm -fv temp`,
ensuring the anonymous volume is removed when the temp container is
deleted
---------
Co-authored-by: Steve Degosserie <723552+stiiifff@users.noreply.github.com>
## Summary
This PR resolves all CI failures following the migration to self-hosted
GitHub runners (`DH-Testing` group) by eliminating sudo dependencies and
fixing Docker connectivity issues.
## Key Changes
### 🔧 **Eliminated sudo requirements across all workflows**
- **Setup Environment**: Installed mold linker and system dependencies
in userspace without sudo
- **Tool Installation**: Replaced apt/system package installations with
direct binary downloads:
- Kurtosis: Direct binary download from GitHub releases (v1.10.3)
- Taplo: Direct binary installation for Cargo.toml formatting
- cargo-nextest: Using `cargo install` instead of GitHub action
(v0.9.100)
- **Runner Cleanup**: Skipped cleanup-runner action entirely on
self-hosted runners (bare-metal manages disk space externally)
### 🐳 **Fixed Docker connectivity for E2E tests**
- **Enhanced dockerode configuration** with robust fallback logic for
different socket locations
- **Added DOCKER_HOST environment variable** to E2E workflow for
consistent Docker daemon access
- **Implemented connection testing** with detailed error diagnostics for
troubleshooting
- **Resolves FailedToOpenSocket errors** by supporting multiple socket
paths and connection methods
### 🏷️ **Workflow optimizations**
- **Label-based targeting**: All heavy workloads (Rust builds, E2E
tests) now run on `DH-Testing` runners
- **Dependency management**: Used `install-deps: false` flag instead of
hardcoded runner detection
- **Permission fixes**: Corrected Docker build permissions and GHCR
organization names
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
This PR resolves all CI failures following the migration to the new
DataHaven Github & Docker Hub organizations, and correctly leverage
self-hosted GitHub runners (`DH-runners` group) by eliminating sudo
dependencies.
## Key Changes
### 🔧 **Eliminated sudo requirements across all workflows**
- **Setup Environment**: Installed mold linker and system dependencies
in userspace without sudo
- **Tool Installation**: Replaced apt/system package installations with
direct binary downloads:
- Kurtosis: Direct binary download from GitHub releases (v1.10.3)
- Taplo: Direct binary installation for Cargo.toml formatting
- cargo-nextest: Using `cargo install` instead of GitHub action
(v0.9.100)
- **Runner Cleanup**: Skipped cleanup-runner action entirely on
self-hosted runners (bare-metal manages disk space externally)
### 🏷️ **Workflow optimizations**
- **Group-based targeting**: All heavy workloads (Rust builds, E2E
tests) now run on `DH-runners` runners
- **Dependency management**: Used `install-deps: false` flag instead of
hardcoded runner detection
Co-authored-by: Claude <noreply@anthropic.com>
## Implement E2E Testing Framework with Isolated Networks
### Summary
Refactors the existing E2E testing infrastructure to provide isolated
test environments with parallel execution support. Each test suite now
runs in its own network namespace, preventing resource conflicts.
### Key Changes
- **New Testing Framework** (`test/framework/`): Base classes for test
lifecycle management with automatic setup/teardown
- **Launcher Module** (`test/launcher/`): Extracted network
orchestration logic from CLI handlers for reusability
- **Parallel Execution**: Added `test-parallel.ts` script with
concurrency limits to prevent resource exhaustion
- **Test Isolation**: Each suite gets unique network IDs (format:
`suiteName-timestamp`) and Docker networks
- **Improved Test Organization**: Migrated tests to new framework,
deprecated old test structure
### Test Improvements
- Added 4 new test suites demonstrating framework usage. :
- `contracts.test.ts` - Smart contract deployment/interaction
- `datahaven-substrate.test.ts` - Substrate API operations
- `cross-chain.test.ts` - Snowbridge cross-chain messaging
- `ethereum-basic.test.ts` - Ethereum network operations
> [!WARNING]
The test suites themselves are bad and shouldn't be consider examples of
good tests. They were AI generated just to test the concurrency of test
runners
### Documentation
- Added comprehensive framework overview (`E2E_FRAMEWORK_OVERVIEW.md`)
- Updated README with parallel testing commands
- Added test patterns and best practices
### Breaking Changes
- Old test suites moved to `e2e - DEPRECATED/` directory
- Test execution now requires extending `BaseTestSuite` class
### Testing
Run tests with: `bun test:e2e` or `bun test:e2e:parallel` (with
concurrency limits)
### TODO
- [ ] Implement good test examples.
- [ ] Implement useful test utils (like waiting for an event to show up
in DataHaven or Ethereum).
- [ ] Enforce tests with CI (currently cannot be done due to
intermittent error when sending a transaction with PAPI).
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: undercover-cactus <lola@moonsonglabs.com>
This PR renames all usage of `moonsonglabs/snowbridge-relayer` to
`moonsonglabs/snowbridge-relay` to adapt to the refactor done in [PR
#18](https://github.com/Moonsong-Labs/snowbridge/pull/18) of our
Snowbridge fork. The aforementioned PR should be merged before this one,
although caution is not really needed since the CI would fail otherwise.
> [!NOTE]
> This is `Part 3` of the ongoing _Docker Series._
## New Additions:
- Launching Datahaven network will spin up containers, as opposed to
native binaries
- `stop:docker` script to kill all dh containers
- `e2e` test suite for datahaven solochain network
- Contains reference test file that uses papi for storage queries,
submitting exts, runtime calls (good job on that facu and tobi)
- Added new utils:
- `waitForLog()` to wait for log lines in docker container logs
- `createPapiConnectors()` helper for test cases to build and connect to
dh network
- `getPapiSigner()` helper to return a papi compatible signer using our
prefunded accounts (alith by default)
- `sendTxn()` helper to submit txn and wait for block inclusion, instead
of finalization, which std library provides
## Changes:
> [!CAUTION]
> Launching native binaries for datahaven no longer supported.
- Datahaven binary location cli option changed to `-i,
--datahaven-image-tag`
- To locally run this you'll need a datahaven docker image handy, you'll
need to either:
- Point to remote dockerhub e.g. `moonsonglabs/datahaven:main` (must be
logged in and have permission)
- Build this locally with `bun build:docker:operator`
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Added end-to-end tests for the Datahaven solochain, including runtime
API queries, storage lookups, extrinsic submissions, and event
listening.
- Introduced CLI option to specify the Datahaven Docker image tag, with
a default value.
- Added CLI option to disable the Relayer.
- Provided new scripts to stop Docker containers associated with
Datahaven.
- Added utility functions for Docker log monitoring and container
startup checks.
- Introduced utilities for interacting with the Datahaven Polkadot API.
- **Improvements**
- Switched Datahaven network launch from local binaries to Docker
containers.
- Enhanced cache accuracy in build workflows by including Rust source
files in cache keys.
- Improved build performance with TypeScript incremental build options.
- Increased timeout for end-to-end tests for better reliability.
- Updated CLI version to 0.2.0.
- Modified Dockerfile build to enable the `fast-runtime` feature.
- Extended network launch summary to include relayer and container
details.
- **Bug Fixes**
- Fixed cleanup logic by tracking and preparing for forced removal of
Docker containers after tests.
- **Chores**
- Updated workflow steps for Docker image handling and network checks.
- Adjusted scripts and workflow logic for improved Docker and test
management.
- Removed top-level disk usage summaries from cleanup workflow for
streamlined reporting.
- Enhanced shell command utility to support asynchronous wait during
execution.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>
This PR:
1. Generally improves the logging of the testing CLI, making the logs
more concise and easier to follow, with clearer sections and
separations.
2. Launches DataHaven solochain nodes at the beginning not the end.
3. Prompts the user if they want to launch DataHaven nodes and
Snowbridge Relayers.
---------
Co-authored-by: Tim B <79199034+timbrinded@users.noreply.github.com>
Eventually our CI will be required to run two private blockchains
locally plus associated relayers.
This PR is to prepare for this fate by improving run times and
refactoring our existing CIs so they are a bit easier to reason about.
### Refactors
- **_We now run ALL CIs on every PR!_** This is so that we decomplexify
the logic around conditional builds and fetching built binaries from
another source. This reduces the surface area of code we have to
maintain at the cost of execution time
- This penalty is ameliorated by a layered caching system. At best, it
will be less than a minute to complete a build since everything will be
cached. On GH runners this is about 6 minutes sadly.
- We will no longer be at risk of important CIs being skipped
erroneously which hide true failures.
- Caching is a low-risk approach because at worst it has to build from
scratch. A bad cache hit will never imply the wrong thing gets build
since cargo is smart enough to just throw away any inappropriate build
artefacts.
- `setup-rust` action created so we have a unified way of setting up
runner and unifying our approach to caching
- Use a unique caching key for different activities and it will fallback
to shared cache if no matches
- we are using `mainnet` kurtosis config so that it works with relayer
assumptions
### Additions
- We can specify the ethereum block time via a new cli arg `--slot-time
<seconds>`
- We can specify arbitrary network_param args which get passed into the
generated yaml
- e.g. giving `bun cli --kurtosis-network-args="pet=cat food=fish" will
add:
```yml
network_params:
# existing params...
pet: cat
food: fish
```
- We now have the ability to programmatically modify the yaml
- This means we are back down to a single `minimal.yml` kurtosis config
so we dont have to maintain changes between them
- Flow is: `add new cli arg` -> `add if() block which mutates yaml` ->
`profit`
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>