## Summary
This PR resolves all CI failures following the migration to self-hosted
GitHub runners (`DH-Testing` group) by eliminating sudo dependencies and
fixing Docker connectivity issues.
## Key Changes
### 🔧 **Eliminated sudo requirements across all workflows**
- **Setup Environment**: Installed mold linker and system dependencies
in userspace without sudo
- **Tool Installation**: Replaced apt/system package installations with
direct binary downloads:
- Kurtosis: Direct binary download from GitHub releases (v1.10.3)
- Taplo: Direct binary installation for Cargo.toml formatting
- cargo-nextest: Using `cargo install` instead of GitHub action
(v0.9.100)
- **Runner Cleanup**: Skipped cleanup-runner action entirely on
self-hosted runners (bare-metal manages disk space externally)
### 🐳 **Fixed Docker connectivity for E2E tests**
- **Enhanced dockerode configuration** with robust fallback logic for
different socket locations
- **Added DOCKER_HOST environment variable** to E2E workflow for
consistent Docker daemon access
- **Implemented connection testing** with detailed error diagnostics for
troubleshooting
- **Resolves FailedToOpenSocket errors** by supporting multiple socket
paths and connection methods
### 🏷️ **Workflow optimizations**
- **Label-based targeting**: All heavy workloads (Rust builds, E2E
tests) now run on `DH-Testing` runners
- **Dependency management**: Used `install-deps: false` flag instead of
hardcoded runner detection
- **Permission fixes**: Corrected Docker build permissions and GHCR
organization names
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
This PR resolves all CI failures following the migration to the new
DataHaven Github & Docker Hub organizations, and correctly leverage
self-hosted GitHub runners (`DH-runners` group) by eliminating sudo
dependencies.
## Key Changes
### 🔧 **Eliminated sudo requirements across all workflows**
- **Setup Environment**: Installed mold linker and system dependencies
in userspace without sudo
- **Tool Installation**: Replaced apt/system package installations with
direct binary downloads:
- Kurtosis: Direct binary download from GitHub releases (v1.10.3)
- Taplo: Direct binary installation for Cargo.toml formatting
- cargo-nextest: Using `cargo install` instead of GitHub action
(v0.9.100)
- **Runner Cleanup**: Skipped cleanup-runner action entirely on
self-hosted runners (bare-metal manages disk space externally)
### 🏷️ **Workflow optimizations**
- **Group-based targeting**: All heavy workloads (Rust builds, E2E
tests) now run on `DH-runners` runners
- **Dependency management**: Used `install-deps: false` flag instead of
hardcoded runner detection
Co-authored-by: Claude <noreply@anthropic.com>
~~Improve CI runs by adding a caching action. Between each run crates
should be cached to speed up building datahaven binary.~~
## Summary
In the end the real issue was not that we were missing a caching action
but that we were caching too much. We went way over the 10GB limit
imposed by github (we were using 60GB of cache). So the most recent
cache were deleted right away or not cache at all.
The over use of the cache was happening because we were caching twice
sccache folder. Once with the `mozilla-actions/sccache-action` and an
other time with `Swatinem/rust-cache`.
The solution was to keep caching sccache on github with
`mozilla-actions/sccache-action`.
In this PR we also exchange the `Swatinem/rust-cache` action to use the
more standard `actions/cache@v4`. It would avoid in the future unwanted
breaking changes and security issues.
---------
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Add CI check for Polkadot-API metadata freshness
This PR adds a new CI workflow that ensures the Polkadot-API metadata
file (`test/.papi/metadata/datahaven.scale`)
is kept up-to-date when runtime changes are made.
Changes:
- Added task-check-metadata.yml workflow that:
- Reuses the WASM artifact from the build-operator job (no duplicate
compilation)
- Runs `bun x papi add` to regenerate metadata
- Fails if the metadata file has uncommitted changes
- Integrated the check into `CI.yml` as a second-tier job alongside
`docker-build`
Why:
- Prevents TypeScript type definitions from becoming out of sync with
the runtime
- Reminds developers to run `bun generate:types:fast` when making
runtime changes
- Ensures consistent type safety across the codebase
The check provides clear error messages with instructions when metadata
is outdated.
## Changes
- New CI file for making Docker Prod images
- Changed E2E tests use an image built from a local dockerfile
- Some cargo build options to make it quicker
- Fix the cache hit rate
- added `tsgo` preview to the project 😎
- Can be invoked with `bun tsgo` to typecheck
- Install in IDE
[VSCode](https://code.visualstudio.com/docs/configure/extensions/extension-marketplace)
& [Zed](https://github.com/zed-extensions/tsgo) for super-fast inline
typechecking (as you type basically)
## Context
This PR attempts to make the frankly unacceptable CI times better. This
achieves that aim by making a crappy image for day-to-day usage and let
the prod issue take ages since that will be infrequently used. The
reason why the original design didn't work for us is because: 1) we are
using the free GH runners 2) when we goto baremetal runners we'll lose
our rapid caching abilities which make using docker cheap.
Also, we add `tsgo` support to improve devex. The improvement is
astounding.
```sh
hyperfine -n tsc "bun tsc --incremental false --extendedDiagnostics" -n tsgo "bun tsgo --incremental false --extendedDiagnostics"
Benchmark 1: tsc
Time (mean ± σ): 5.500 s ± 0.221 s [User: 8.939 s, System: 0.400 s]
Range (min … max): 5.196 s … 5.845 s 10 runs
Benchmark 2: tsgo
Time (mean ± σ): 99.1 ms ± 8.4 ms [User: 392.8 ms, System: 54.1 ms]
Range (min … max): 88.3 ms … 116.0 ms 29 runs
Summary
tsgo ran
55.48 ± 5.22 times faster than tsc
```
Eventually our CI will be required to run two private blockchains
locally plus associated relayers.
This PR is to prepare for this fate by improving run times and
refactoring our existing CIs so they are a bit easier to reason about.
### Refactors
- **_We now run ALL CIs on every PR!_** This is so that we decomplexify
the logic around conditional builds and fetching built binaries from
another source. This reduces the surface area of code we have to
maintain at the cost of execution time
- This penalty is ameliorated by a layered caching system. At best, it
will be less than a minute to complete a build since everything will be
cached. On GH runners this is about 6 minutes sadly.
- We will no longer be at risk of important CIs being skipped
erroneously which hide true failures.
- Caching is a low-risk approach because at worst it has to build from
scratch. A bad cache hit will never imply the wrong thing gets build
since cargo is smart enough to just throw away any inappropriate build
artefacts.
- `setup-rust` action created so we have a unified way of setting up
runner and unifying our approach to caching
- Use a unique caching key for different activities and it will fallback
to shared cache if no matches
- we are using `mainnet` kurtosis config so that it works with relayer
assumptions
### Additions
- We can specify the ethereum block time via a new cli arg `--slot-time
<seconds>`
- We can specify arbitrary network_param args which get passed into the
generated yaml
- e.g. giving `bun cli --kurtosis-network-args="pet=cat food=fish" will
add:
```yml
network_params:
# existing params...
pet: cat
food: fish
```
- We now have the ability to programmatically modify the yaml
- This means we are back down to a single `minimal.yml` kurtosis config
so we dont have to maintain changes between them
- Flow is: `add new cli arg` -> `add if() block which mutates yaml` ->
`profit`
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>