## Summary
Improve CI performance through better caching and simplified workflows:
- Add a new `warm-sccache` job that runs before all Rust CI jobs to
pre-populate the sccache cache
- Cache locally installed tools (`~/.local/bin`, `~/.local/lib`,
`~/.local/include`) with a version-based hash key
- Simplify Rust tests by removing matrix partitioning and switching from
cargo-nextest to cargo test
## Problem
**Poor sccache hit rates:**
- `rust-lint`, `unit-tests`, and `build-operator` ran in parallel with
cold caches
- Each job compiled dependencies independently
- Cache was only saved at job completion (too late for parallel jobs to
benefit)
**Redundant tool downloads:**
- Mold, LLVM/Clang, protoc, and libpq (~500MB+) were downloaded fresh on
each job
- No caching of locally installed tools
**Overcomplicated test setup:**
- 2-partition matrix for tests added complexity without significant
benefit
- cargo-nextest required installation step (~30s overhead)
- Separate result-checker job wasn't necessary
## Solution
### 1. sccache warm-up job (`task-warm-sccache.yml`)
- Runs first (Tier 0) before all Rust jobs
- Compiles with release mode + all features (`fast-runtime`,
`try-runtime`, `runtime-benchmarks`)
- Compiles with debug mode to cover test builds
- Uses `SKIP_WASM_BUILD=1` to minimize warm-up time
### 2. Local tools caching (`setup-env/action.yml`)
- Define tool versions as env vars (`MOLD_VERSION`, `LLVM_VERSION`,
`PROTOC_VERSION`, `LIBPQ_VERSION`)
- Generate SHA256 hash from versions for cache key
- Cache `~/.local/bin`, `~/.local/lib`, `~/.local/include` (not all of
`~/.local` to avoid container storage)
- Set up PATH and env vars immediately after cache restore
### 3. Simplified Rust tests (`task-rust-tests.yml`)
- Remove 2-partition matrix strategy
- Replace cargo-nextest with `cargo test --locked`
- Remove separate tests-result-checker job
## CI Flow
```
┌─ build-operator (warm sccache + cached tools)
│
CI Start → warm-sccache ─┼─ rust-lint (warm sccache + cached tools)
│
└─ unit-tests (warm sccache + cached tools)
```
## Test plan
- [x] CI workflow runs successfully
- [x] Warm-sccache job completes and shows cache stats
- [x] Local tools cache restores correctly (no permission errors)
- [x] Downstream Rust jobs show improved cache hit rates
- [x] Rust tests pass with simplified single-job setup
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: undercover-cactus <lola@moonsonglabs.com>
## Summary
This PR resolves all CI failures following the migration to the new
DataHaven Github & Docker Hub organizations, and correctly leverage
self-hosted GitHub runners (`DH-runners` group) by eliminating sudo
dependencies.
## Key Changes
### 🔧 **Eliminated sudo requirements across all workflows**
- **Setup Environment**: Installed mold linker and system dependencies
in userspace without sudo
- **Tool Installation**: Replaced apt/system package installations with
direct binary downloads:
- Kurtosis: Direct binary download from GitHub releases (v1.10.3)
- Taplo: Direct binary installation for Cargo.toml formatting
- cargo-nextest: Using `cargo install` instead of GitHub action
(v0.9.100)
- **Runner Cleanup**: Skipped cleanup-runner action entirely on
self-hosted runners (bare-metal manages disk space externally)
### 🏷️ **Workflow optimizations**
- **Group-based targeting**: All heavy workloads (Rust builds, E2E
tests) now run on `DH-runners` runners
- **Dependency management**: Used `install-deps: false` flag instead of
hardcoded runner detection
Co-authored-by: Claude <noreply@anthropic.com>
~~Improve CI runs by adding a caching action. Between each run crates
should be cached to speed up building datahaven binary.~~
## Summary
In the end the real issue was not that we were missing a caching action
but that we were caching too much. We went way over the 10GB limit
imposed by github (we were using 60GB of cache). So the most recent
cache were deleted right away or not cache at all.
The over use of the cache was happening because we were caching twice
sccache folder. Once with the `mozilla-actions/sccache-action` and an
other time with `Swatinem/rust-cache`.
The solution was to keep caching sccache on github with
`mozilla-actions/sccache-action`.
In this PR we also exchange the `Swatinem/rust-cache` action to use the
more standard `actions/cache@v4`. It would avoid in the future unwanted
breaking changes and security issues.
---------
Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
## Changes
- New CI file for making Docker Prod images
- Changed E2E tests use an image built from a local dockerfile
- Some cargo build options to make it quicker
- Fix the cache hit rate
- added `tsgo` preview to the project 😎
- Can be invoked with `bun tsgo` to typecheck
- Install in IDE
[VSCode](https://code.visualstudio.com/docs/configure/extensions/extension-marketplace)
& [Zed](https://github.com/zed-extensions/tsgo) for super-fast inline
typechecking (as you type basically)
## Context
This PR attempts to make the frankly unacceptable CI times better. This
achieves that aim by making a crappy image for day-to-day usage and let
the prod issue take ages since that will be infrequently used. The
reason why the original design didn't work for us is because: 1) we are
using the free GH runners 2) when we goto baremetal runners we'll lose
our rapid caching abilities which make using docker cheap.
Also, we add `tsgo` support to improve devex. The improvement is
astounding.
```sh
hyperfine -n tsc "bun tsc --incremental false --extendedDiagnostics" -n tsgo "bun tsgo --incremental false --extendedDiagnostics"
Benchmark 1: tsc
Time (mean ± σ): 5.500 s ± 0.221 s [User: 8.939 s, System: 0.400 s]
Range (min … max): 5.196 s … 5.845 s 10 runs
Benchmark 2: tsgo
Time (mean ± σ): 99.1 ms ± 8.4 ms [User: 392.8 ms, System: 54.1 ms]
Range (min … max): 88.3 ms … 116.0 ms 29 runs
Summary
tsgo ran
55.48 ± 5.22 times faster than tsc
```
## Changes
- New option: `--kurtosis-enclave-name` to allow you to specify a new
ethereum network with a different enclave name. Neccesary step for
setting up local testing with multiple enclaves running at once
- Refactor: all single dash options (e.g. `-i`) have been renamed to
have double dashes `--` for consistency others
- sad times: CLI now must be invoked `bun cli launch` since we have
multiple fns now
- Package Update
- New Function: `stop` which allows you to kill all components (eth, dh,
relayers) or only a single part
- Gonza's mac target build fix
- Misc fixes to ci like caching and rate limits
## Additional Comment
The CLI needs multiple commands and this PR adds the first new one
`stop`.
This syntax is faily extensible so @ffarall 's k8 command can follow
suite.
Originally we were going to have an `exec` command to expose internal
fns to be callable by command line, i.e. generate beacon checkpoint -
however the need for it has since been superceded by the deploy to k8
command that's going to be added. I've left the code in place though so
when another usecase comes up we can just plug that in.
---------
Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>
> [!NOTE]
> This is `Part 3` of the ongoing _Docker Series._
## New Additions:
- Launching Datahaven network will spin up containers, as opposed to
native binaries
- `stop:docker` script to kill all dh containers
- `e2e` test suite for datahaven solochain network
- Contains reference test file that uses papi for storage queries,
submitting exts, runtime calls (good job on that facu and tobi)
- Added new utils:
- `waitForLog()` to wait for log lines in docker container logs
- `createPapiConnectors()` helper for test cases to build and connect to
dh network
- `getPapiSigner()` helper to return a papi compatible signer using our
prefunded accounts (alith by default)
- `sendTxn()` helper to submit txn and wait for block inclusion, instead
of finalization, which std library provides
## Changes:
> [!CAUTION]
> Launching native binaries for datahaven no longer supported.
- Datahaven binary location cli option changed to `-i,
--datahaven-image-tag`
- To locally run this you'll need a datahaven docker image handy, you'll
need to either:
- Point to remote dockerhub e.g. `moonsonglabs/datahaven:main` (must be
logged in and have permission)
- Build this locally with `bun build:docker:operator`
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Added end-to-end tests for the Datahaven solochain, including runtime
API queries, storage lookups, extrinsic submissions, and event
listening.
- Introduced CLI option to specify the Datahaven Docker image tag, with
a default value.
- Added CLI option to disable the Relayer.
- Provided new scripts to stop Docker containers associated with
Datahaven.
- Added utility functions for Docker log monitoring and container
startup checks.
- Introduced utilities for interacting with the Datahaven Polkadot API.
- **Improvements**
- Switched Datahaven network launch from local binaries to Docker
containers.
- Enhanced cache accuracy in build workflows by including Rust source
files in cache keys.
- Improved build performance with TypeScript incremental build options.
- Increased timeout for end-to-end tests for better reliability.
- Updated CLI version to 0.2.0.
- Modified Dockerfile build to enable the `fast-runtime` feature.
- Extended network launch summary to include relayer and container
details.
- **Bug Fixes**
- Fixed cleanup logic by tracking and preparing for forced removal of
Docker containers after tests.
- **Chores**
- Updated workflow steps for Docker image handling and network checks.
- Adjusted scripts and workflow logic for improved Docker and test
management.
- Removed top-level disk usage summaries from cleanup workflow for
streamlined reporting.
- Enhanced shell command utility to support asynchronous wait during
execution.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>
## Summary
This PR attempts to improve caching, and thus speeds, for Docker image
generation.
This is to dramatically reduce the times of building images by using:
- cargo chef
- cache mounts
- sccache
- cache dance
## Context
As a result this means thats changes that Do not Affect the code, should
(in theory) not trigger a new build to be run.
Changes that do change the rust code should also in theory be shorter as
the dependencies are unlikely to have changed and so that too can be
reused. In fact some part of the process should always be able to be
re-used unless we do something like drastic like change rust-toolchain,
but even then should only be a one time thing to regenerate that part of
the cache.
---------
Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>
Eventually our CI will be required to run two private blockchains
locally plus associated relayers.
This PR is to prepare for this fate by improving run times and
refactoring our existing CIs so they are a bit easier to reason about.
### Refactors
- **_We now run ALL CIs on every PR!_** This is so that we decomplexify
the logic around conditional builds and fetching built binaries from
another source. This reduces the surface area of code we have to
maintain at the cost of execution time
- This penalty is ameliorated by a layered caching system. At best, it
will be less than a minute to complete a build since everything will be
cached. On GH runners this is about 6 minutes sadly.
- We will no longer be at risk of important CIs being skipped
erroneously which hide true failures.
- Caching is a low-risk approach because at worst it has to build from
scratch. A bad cache hit will never imply the wrong thing gets build
since cargo is smart enough to just throw away any inappropriate build
artefacts.
- `setup-rust` action created so we have a unified way of setting up
runner and unifying our approach to caching
- Use a unique caching key for different activities and it will fallback
to shared cache if no matches
- we are using `mainnet` kurtosis config so that it works with relayer
assumptions
### Additions
- We can specify the ethereum block time via a new cli arg `--slot-time
<seconds>`
- We can specify arbitrary network_param args which get passed into the
generated yaml
- e.g. giving `bun cli --kurtosis-network-args="pet=cat food=fish" will
add:
```yml
network_params:
# existing params...
pet: cat
food: fish
```
- We now have the ability to programmatically modify the yaml
- This means we are back down to a single `minimal.yml` kurtosis config
so we dont have to maintain changes between them
- Flow is: `add new cli arg` -> `add if() block which mutates yaml` ->
`profit`
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>