Commit graph

17 commits

Author SHA1 Message Date
undercover-cactus
b258113547 use ARG to build Dockerfile; use local chain everywhere; 2026-02-10 17:36:07 +01:00
undercover-cactus
aace180cf1 the /data folder is actually required for the key to inserted in the node 2026-02-09 13:57:26 +01:00
undercover-cactus
98383f0e9f unify dev Dockerfile and operator Dockerfile; fix the folders; 2026-02-09 13:57:25 +01:00
undercover-cactus
853e56ae71 fix the dockerfile used in the CI for storagehub use; 2026-02-09 13:56:16 +01:00
Steve Degosserie
a5522659bf
feat: Add Docker Compose setup for local DataHaven network (#314)
## 🎯 Overview

This PR introduces a comprehensive Docker Compose configuration for
running a complete local DataHaven network, making it significantly
easier for developers to spin up and test the entire stack locally.

## 🏗️ Architecture

```mermaid
graph TB
    subgraph "DataHaven Network (Docker)"
        subgraph "Consensus Layer"
            Alice["🔷 Alice (Validator)<br/>:9944 RPC<br/>4 Keys: GRAN, BABE, IMON, BEEF"]
            Bob["🔷 Bob (Validator)<br/>:9945 RPC<br/>4 Keys: GRAN, BABE, IMON, BEEF"]
            Alice <-->|P2P/mDNS| Bob
        end

        subgraph "Storage Provider Layer"
            MSP["💾 MSP (Charlie)<br/>:9946 RPC<br/>1 GiB Storage<br/>1 Key: BCSV"]
            BSP01["💾 BSP01 (Dave)<br/>:9947 RPC<br/>1 GiB Storage<br/>1 Key: BCSV"]
            BSP02["💾 BSP02 (Eve)<br/>:9948 RPC<br/>1 GiB Storage<br/>1 Key: BCSV"]
        end

        subgraph "Monitoring Layer"
            Indexer["📊 Indexer<br/>:9949 RPC<br/>Full Mode<br/>No Keys"]
            Fisherman["🎣 Fisherman (Gustavo)<br/>:9950 RPC<br/>Storage Monitor<br/>1 Key: BCSV"]
            DB["🗄️ PostgreSQL<br/>:5432<br/>indexer/datahaven"]
        end

        Alice -.->|Syncs| MSP
        Bob -.->|Syncs| MSP
        Alice -.->|Syncs| BSP01
        Bob -.->|Syncs| BSP02
        Alice -.->|Syncs| Indexer
        Bob -.->|Syncs| Fisherman
        
        MSP -.->|Monitors| Fisherman
        BSP01 -.->|Monitors| Fisherman
        BSP02 -.->|Monitors| Fisherman
        
        Indexer -->|Writes| DB
        Fisherman -->|Writes| DB
    end

    style Alice fill:#4a90e2,stroke:#2e5c8a,color:#fff
    style Bob fill:#4a90e2,stroke:#2e5c8a,color:#fff
    style MSP fill:#50c878,stroke:#2d7a4a,color:#fff
    style BSP01 fill:#50c878,stroke:#2d7a4a,color:#fff
    style BSP02 fill:#50c878,stroke:#2d7a4a,color:#fff
    style Indexer fill:#f5a623,stroke:#b87818,color:#fff
    style Fisherman fill:#f5a623,stroke:#b87818,color:#fff
    style DB fill:#9b59b6,stroke:#6c3a82,color:#fff
```

**Legend:**
- 🔷 Validators - Consensus and block production
- 💾 Storage Providers - File storage and retrieval
- 📊 Indexer - Full blockchain indexing
- 🎣 Fisherman - Storage provider monitoring
- 🗄️ PostgreSQL - Database for indexer/fisherman
- `<-->` P2P Communication | `-.->` Network Sync | `-->` Database
Connection

##  What's Included

### Core Network (8 Services)
- **2 Validator Nodes** (Alice & Bob) - Consensus and block production
- **1 Main Storage Provider (MSP)** - Charlie with 1 GiB storage
capacity
- **2 Backup Storage Providers (BSPs)** - Dave and Eve with 1 GiB each
- **1 StorageHub Indexer** - Full blockchain indexer with PostgreSQL
- **1 Fisherman Node** - Gustavo monitoring storage provider behavior
- **1 PostgreSQL Database** - Shared database for indexer and fisherman

### Key Features
 **Automated Key Injection** - All validator and storage provider keys
automatically injected on startup
 **Health Checks** - Validators have health checks that verify RPC port
is listening before dependent services start
 **Orchestrated Startup** - Services start in correct order with
health-based dependencies
 **Persistent Storage** - Chain data, keystores, and database all
persisted in Docker volumes
 **Unified Entrypoint Script** - Single script handles all node types
(validator, MSP, BSP, fisherman)
 **Proper User Permissions** - Root for setup, switches to datahaven
user for node execution
 **mDNS Peer Discovery** - Nodes automatically discover each other on
the Docker network
 **Comprehensive Documentation** - Full setup guide, troubleshooting,
and verification steps

### 🔄 Startup Sequence

The network starts in a carefully orchestrated sequence to ensure
stability:

1. **Alice (Validator)** starts first
   - Injects 4 keys (GRAN, BABE, IMON, BEEF)
   - Health check waits for RPC port 9944 to be listening
   - Uses `/proc/net/tcp` for minimal-dependency port checking

2. **Bob (Validator)** waits for Alice to be healthy
   - Ensures at least one validator is fully operational
   - Enables block production to start immediately

3. **Storage Providers & Monitoring** wait for both validators to be
healthy
   - MSP, BSP01, BSP02 start after validators are ready
   - Indexer and Fisherman wait for validators before syncing
   - PostgreSQL starts independently with its own health check

This dependency chain prevents race conditions and ensures reliable
network formation.

## 🚀 Quick Start

```bash
cd operator

# Build the binary (development mode with fast blocks)
./scripts/docker-prepare.sh --fast

# Start the entire network
docker-compose up -d

# View logs
docker-compose logs -f

# Check service health
docker-compose ps

# Stop the network
docker-compose down -v
```

## 📋 Port Mappings

| Service | RPC/WebSocket | Prometheus | P2P | Database/API |
|---------|---------------|------------|-----|--------------|
| Alice | 9944 | 9615 | 30333 | - |
| Bob | 9945 | 9616 | 30334 | - |
| MSP | 9946 | 9617 | 30335 | - |
| BSP01 | 9947 | 9618 | 30336 | - |
| BSP02 | 9948 | 9619 | 30337 | - |
| Indexer | 9949 | 9620 | 30338 | - |
| Fisherman | 9950 | 9621 | 30339 | - |
| PostgreSQL | - | - | - | 5432 |

## 🔑 Key Injection

All cryptographic keys are automatically injected on startup:

**Validators (Alice & Bob)** - 4 keys each:
- GRANDPA (ed25519) - Finality
- BABE (sr25519) - Block authoring
- ImOnline (sr25519) - Heartbeat
- BEEFY (ecdsa) - Bridge consensus

**Storage Providers (MSP, BSPs, Fisherman)** - 1 key each:
- BCSV (ecdsa) - Storage provider identity

**Indexer** - No keys required (non-validating node)

## 🩺 Health Checks

Validator nodes (Alice & Bob) implement health checks to ensure proper
startup sequencing:

**Health Check Method:**
- Reads `/proc/net/tcp` directly to check if RPC port is listening
- Zero dependencies - works in minimal debian:stable-slim containers
- Converts port to hex and searches for LISTEN state (0A)

**Configuration:**
- Start period: 30s (allows node initialization)
- Interval: 10s (check every 10 seconds)
- Timeout: 5s (per health check)
- Retries: 5 (must pass 5 consecutive checks)

**Benefits:**
- Prevents dependent services from starting before validators are ready
- Eliminates race conditions during network formation
- No additional packages required in Docker image

## 🍎 macOS Requirement

**Important:** On Docker Desktop for macOS, you must use the
**experimental DockerVMM** virtualization framework:

1. Open Docker Desktop settings
2. Go to "General" tab
3. Enable "Use experimental virtualization framework (DockerVMM)"
4. Restart Docker Desktop

The default Apple Virtualization Framework causes networking issues with
P2P connections.

## 📁 Files Changed

- `operator/docker-compose.yml` - Main orchestration configuration
- `operator/scripts/docker-entrypoint.sh` - Unified key injection script
- `operator/scripts/docker-prepare.sh` - Binary build helper
- `operator/scripts/docker-healthcheck.sh` - Health check script for
validators
- `operator/DOCKER-COMPOSE.md` - Comprehensive documentation

## 🔍 Testing

The configuration has been tested on:
-  Docker Desktop for macOS (with DockerVMM)
-  Docker on Linux/Ubuntu

All nodes successfully:
- Inject required keys
- Pass health checks
- Discover peers via mDNS
- Sync blocks and finalize
- Connect to PostgreSQL database (indexer/fisherman)

## 📝 Notes

- All settings are configured for **local development only**
- Uses well-known test seed phrase (⚠️ never use in production!)
- RPC exposed without authentication
- Unsafe flags enabled for convenience

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-22 14:07:46 +01:00
Steve Degosserie
cffdad2358
fix: 🔧 Don't remove /usr/sbin utils in operator Docker image (#240) 2025-10-15 23:54:29 +02:00
Steve Degosserie
931a225f09
fix: 🔧 Fix binary location in Docker img (#239) 2025-10-15 22:54:44 +02:00
Steve Degosserie
9a5404de82
refactor: Consolidate and optimize Docker image architecture (#233)
## Overview

This PR consolidates and optimizes the Docker build system, reducing
redundancy and improving CI/CD performance. The changes eliminate
duplicate Dockerfiles, introduce a flexible build template, and optimize
release builds to reuse CI artifacts.

## Changes Summary

### 🐳 Docker Images Restructured

**Before:** 5 Dockerfiles with significant overlap
**After:** 4 focused images + 1 utility

#### Final Structure:

1. **`operator/Dockerfile`**  Updated
   - **Standard operator image** for CI and release builds
   - Minimal node image (accepts pre-built binaries)
   - GHCR: `ghcr.io/datahaven-xyz/datahaven/datahaven` (CI)
   - DockerHub: `datahavenxyz/datahaven` (releases)

2. **`docker/datahaven-build.Dockerfile`** (moved from
`operator/Dockerfile`)
   - Full source-to-binary build for manual releases
   - DockerHub: `datahavenxyz/datahaven:{label}`
   - Supports custom RUSTFLAGS and fast-runtime feature
   - Only used for manual workflow_dispatch builds

3. **`docker/datahaven-production.Dockerfile`** (kept)
   - Binary builder for CPU-specific releases
   - Used by build-prod-binary workflow template
   - Supports custom target-cpu flags

4. **`docker/datahaven-dev.Dockerfile`**  NEW (local dev only)
   - **FOR LOCAL DEVELOPMENT/TROUBLESHOOTING ONLY**
   - Includes debug tools: gdb, strace, vim, sudo
   - Extra dependencies: librocksdb-dev, curl
   - RUST_BACKTRACE enabled by default
   - **DO NOT USE for CI or production builds**

5. **`test/docker/crossbuild-mac-libpq.dockerfile`** (kept)
   - Utility for macOS → Linux cross-compilation

#### Removed (Redundant):
-  `docker/datahaven.Dockerfile` → replaced by operator/Dockerfile
-  `test/docker/datahaven-node-local.dockerfile` → replaced by
datahaven-dev.Dockerfile

---

### 🔄 Workflow Improvements

#### Enhanced `publish-docker` Template
- Supports both GHCR and DockerHub registries
- Flexible inputs: dockerfile, context, build-args, cache scope
- Auto-generates OCI-compliant labels
- Reduces code duplication (~70 lines → ~15 per workflow)

#### Refactored CI Pipeline
- **`docker-build-ci`**: Builds `operator/Dockerfile` → GHCR for CI/E2E
testing
- **`docker-build-release`**: Builds `operator/Dockerfile` → DockerHub
(main branch only)
- Both CI and release workflows now use the same minimal operator image
- Release builds **reuse CI binaries** instead of rebuilding from source

#### Optimized Release Workflow
The `task-docker-release` workflow now has dual modes:

**Mode 1: `workflow_call` (CI - main pushes)**
-  Reuses binary from CI's build-operator task
-  Uses lightweight `operator/Dockerfile`
-  Tags: `latest`, `sha-{short}`
-  **Fast**: ~5 minutes (vs ~30 min previously)

**Mode 2: `workflow_dispatch` (Manual)**
-  Full source build with `datahaven-build.Dockerfile`
-  Custom branch and label selection
-  Optional fast-runtime feature
-  Tags: `PROD-{label}` or user-defined

---

### 🔧 Additional Optimizations

- Copy libpq5 from builder stage instead of reinstalling (smaller,
faster)
- Remove redundant protobuf-compiler package (use protoc v21.12
directly)
- Standardize user UID to 1000 across all runtime images
- Consistent OCI labeling and metadata

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-15 01:33:20 +02:00
Steve Degosserie
8874a99100
fix: 🔧 Add missing libpq5 lib to DH operator Docker image (#232) 2025-10-14 12:37:31 +02:00
undercover-cactus
514a16ac1f
ci: remove sccache from image build for prod (#200)
In this PR, we remove the caching of the sccache folder because it is
too big (~3GB) and fill our cache too fast.

What to expect ?  
* It will make the build a bit slower but it is fine because it only
build on `main`. We are preparing another PR that will speed up the
build of the prod image. Also we are not sure the cache is actually
being used (`gha` cache is in beta).
* Will free some space for caching and stop deleting our cache which
make other jobs work faster.

Co-authored-by: Ahmad Kaouk <56095276+ahmadkaouk@users.noreply.github.com>
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
2025-10-09 12:33:35 +00:00
Steve Degosserie
6fe4f34c4c
fix: Update Rust toolchain and Docker build dependencies (#156)
## Summary
- Upgraded Rust toolchain from 1.87 to 1.88 for latest compiler
improvements
- Updated Docker build environment with latest paritytech base image and
build tools
- Fixed node branding and support URL to reflect DataHaven project

## Changes

### Rust Toolchain Update (`rust-toolchain.toml`)
- Upgraded from Rust 1.87 to 1.88

### Docker Build Environment (`Dockerfile`)
- Updated base image from `paritytech/ci-unified:bullseye-1.85.0` to
`bullseye-1.88.0`
- Upgraded mold linker from v2.39.0 to v2.40.4 for faster builds
- Added protoc v21.12 for protobuf compilation support
- Added libpq-dev dependency for PostgreSQL integration
- Updated cargo-chef from 0.1.71 to 0.1.72

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-11 09:56:14 +02:00
Tim B
1997c298a1
refactor: 🐳 Improve docker caching (again) (#86)
## Changes

- New CI file for making Docker Prod images
- Changed E2E tests use an image built from a local dockerfile
- Some cargo build options to make it quicker
- Fix the cache hit rate
- added `tsgo` preview to the project 😎
  - Can be invoked with `bun tsgo` to typecheck
- Install in IDE
[VSCode](https://code.visualstudio.com/docs/configure/extensions/extension-marketplace)
& [Zed](https://github.com/zed-extensions/tsgo) for super-fast inline
typechecking (as you type basically)

## Context

This PR attempts to make the frankly unacceptable CI times better. This
achieves that aim by making a crappy image for day-to-day usage and let
the prod issue take ages since that will be infrequently used. The
reason why the original design didn't work for us is because: 1) we are
using the free GH runners 2) when we goto baremetal runners we'll lose
our rapid caching abilities which make using docker cheap.

Also, we add `tsgo` support to improve devex. The improvement is
astounding.

```sh
hyperfine -n tsc "bun tsc --incremental false --extendedDiagnostics" -n tsgo "bun tsgo --incremental false --extendedDiagnostics"
Benchmark 1: tsc
  Time (mean ± σ):      5.500 s ±  0.221 s    [User: 8.939 s, System: 0.400 s]
  Range (min … max):    5.196 s …  5.845 s    10 runs
 
Benchmark 2: tsgo
  Time (mean ± σ):      99.1 ms ±   8.4 ms    [User: 392.8 ms, System: 54.1 ms]
  Range (min … max):    88.3 ms … 116.0 ms    29 runs
 
Summary
  tsgo ran
   55.48 ± 5.22 times faster than tsc
```
2025-05-27 16:14:15 +00:00
Steve Degosserie
ae1798a2a4
fix: 🚑 Don't remove the shell from the DH node Docker image (#76) 2025-05-16 15:01:14 -03:00
Tim B
82145b882b
test: 🐳 Add docker support for datahaven nodes (#71)
> [!NOTE]  
> This is  `Part 3` of the ongoing _Docker Series._


## New Additions:
- Launching Datahaven network will spin up containers, as opposed to
native binaries
- `stop:docker` script to kill all dh containers
- `e2e` test suite for datahaven solochain network
- Contains reference test file that uses papi for storage queries,
submitting exts, runtime calls (good job on that facu and tobi)
- Added new utils:
  - `waitForLog()` to wait for log lines in docker container logs
- `createPapiConnectors()` helper for test cases to build and connect to
dh network
- `getPapiSigner()` helper to return a papi compatible signer using our
prefunded accounts (alith by default)
- `sendTxn()` helper to submit txn and wait for block inclusion, instead
of finalization, which std library provides

## Changes:

> [!CAUTION] 
> Launching native binaries for datahaven no longer supported.


- Datahaven binary location cli option changed to `-i,
--datahaven-image-tag`
- To locally run this you'll need a datahaven docker image handy, you'll
need to either:
- Point to remote dockerhub e.g. `moonsonglabs/datahaven:main` (must be
logged in and have permission)
  - Build this locally with `bun build:docker:operator`

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added end-to-end tests for the Datahaven solochain, including runtime
API queries, storage lookups, extrinsic submissions, and event
listening.
- Introduced CLI option to specify the Datahaven Docker image tag, with
a default value.
  - Added CLI option to disable the Relayer.
- Provided new scripts to stop Docker containers associated with
Datahaven.
- Added utility functions for Docker log monitoring and container
startup checks.
- Introduced utilities for interacting with the Datahaven Polkadot API.

- **Improvements**
- Switched Datahaven network launch from local binaries to Docker
containers.
- Enhanced cache accuracy in build workflows by including Rust source
files in cache keys.
- Improved build performance with TypeScript incremental build options.
  - Increased timeout for end-to-end tests for better reliability.
  - Updated CLI version to 0.2.0.
  - Modified Dockerfile build to enable the `fast-runtime` feature.
- Extended network launch summary to include relayer and container
details.

- **Bug Fixes**
- Fixed cleanup logic by tracking and preparing for forced removal of
Docker containers after tests.

- **Chores**
  - Updated workflow steps for Docker image handling and network checks.
- Adjusted scripts and workflow logic for improved Docker and test
management.
- Removed top-level disk usage summaries from cleanup workflow for
streamlined reporting.
- Enhanced shell command utility to support asynchronous wait during
execution.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>
2025-05-16 15:17:05 +01:00
Tim B
ce59dd9625
fix: 🐳 Improve Docker Caching (#66)
## Summary

This PR attempts to improve caching, and thus speeds, for Docker image
generation.

This is to dramatically reduce the times of building images by using:

- cargo chef 
- cache mounts
- sccache
- cache dance

## Context

As a result this means thats changes that Do not Affect the code, should
(in theory) not trigger a new build to be run.

Changes that do change the rust code should also in theory be shorter as
the dependencies are unlikely to have changed and so that too can be
reused. In fact some part of the process should always be able to be
re-used unless we do something like drastic like change rust-toolchain,
but even then should only be a one time thing to regenerate that part of
the cache.

---------

Co-authored-by: Facundo Farall <37149322+ffarall@users.noreply.github.com>
2025-05-13 09:12:32 +00:00
Tim B
6aeece550b
ci: 🐳 Start Publishing Docker Images (#64) 2025-05-08 20:32:55 -03:00
Gonza Montiel
7a4d441fd9
build: 🏗️ DataHaven operator setup (#6)
Adds the `Substrate` node and runtime, as well as configuration and test
files, from https://github.com/Moonsong-Labs/flamingo to the `operator`
folder in DataHaven
2025-03-17 17:57:14 +01:00