## Summary
Introduces typed offence classification, a linear Perbill-to-WAD
conversion for EigenLayer slashing, historical offence filtering, and a
new E2E test proving end-to-end liveness detection through
`pallet_im_online`.
---
### OffenceKind enum
New `OffenceKind` enum classifies consensus offences:
- `LivenessOffence` — missed heartbeats (ImOnline)
- `BabeEquivocation` — double block production
- `GrandpaEquivocation` — double finality votes
- `BeefyEquivocation` — double BEEFY votes / fork voting / future block
voting
- `Custom(BoundedVec<u8, 256>)` — manual / governance slashes
Each variant carries a human-readable description string through the
Snowbridge message to EigenLayer's
`DatahavenServiceManager.slashValidatorsOperator()`.
### EquivocationReportWrapper
Generic wrapper around `ReportOffence` wired for BABE, GRANDPA, BEEFY,
and ImOnline in all three runtimes:
1. **Filters historical offences** — discards reports whose session
predates the bonding period, using `BondedEras` storage (analogous to
`FilterHistoricalOffences` in `pallet_staking`, but adapted to this
pallet's own era tracking).
2. **Tags offence kind** — stores the `OffenceKind` in
`PendingOffenceKind` double-map `(SessionIndex, ValidatorId)` before
delegating to `pallet_offences`. The `on_offence` handler reads it via
`take()` in the same block.
3. **Cleans up on failure** — removes stale `PendingOffenceKind` entries
if the inner reporter returns an error (e.g. duplicate report),
preventing them from leaking into unrelated future offences.
### Perbill to WAD conversion and MaxSlashWad
#### How Substrate computes slash fractions
Each offence type in Substrate defines its own
`slash_fraction(offenders_count)` returning a `Perbill`:
| Offence | Formula | Typical range |
|---|---|---|
| **BABE equivocation** | `min((3k/n)^2, 1)` | 1 offender / 100
validators: ~0.09%; 1/2: capped to 100% |
| **GRANDPA equivocation** | `min((3k/n)^2, 1)` | Same as BABE |
| **BEEFY double-vote** | `min((3k/n)^2, 1)` | Same as BABE/GRANDPA |
| **BEEFY fork/future voting** | Fixed `50%` | Always 50% |
| **ImOnline liveness** | `min(3*(k - floor(n/10) - 1)/n, 1) * 7%` | 10%
or fewer offline: **0%**; ~33% offline: ~5%; ~43% offline: 7% (max) |
Where `k` = number of concurrent offenders, `n` = validator set size.
**Key behavior for small validator sets (E2E):** With n=2, the ImOnline
threshold is `floor(2/10) + 1 = 1`. A single offender (`k=1`) fails
`checked_sub(1)` giving `Perbill(0)`. This means no `Slashes` storage
entry is created (since `compute_slash` returns `None` when the new
fraction doesn't exceed the prior slash), but the `SlashReported` event
is still emitted, proving the full detection pipeline works.
#### Linear conversion to EigenLayer WAD
The Substrate `Perbill` is linearly mapped to a WAD value capped by
`MaxSlashWad`:
```
WAD = perbill.deconstruct() * MaxSlashWad / 1_000_000_000
```
- `MaxSlashWad` default: **5e16** (= 5% in WAD format, where 1e18 =
100%)
- Governance-changeable dynamic runtime parameter (codec index 46)
- `Perbill(100%)` maps to exactly `MaxSlashWad` (the cap)
- `Perbill(0%)` maps to 0 (no slash sent to EigenLayer)
#### Concrete examples (with default MaxSlashWad = 5%)
| Scenario | Substrate Perbill | WAD sent to EigenLayer | EigenLayer % |
|---|---|---|---|
| BABE equivocation (1 of 100 validators) | ~0.09% | ~4.5e13 | ~0.0045%
|
| BABE equivocation (1 of 2 validators) | 100% (capped) | 5e16 | 5%
(max) |
| BEEFY fork voting | 50% | 2.5e16 | 2.5% |
| ImOnline liveness (1 of 2 offline) | 0% | 0 (no slash) | 0% |
| ImOnline liveness (~33% of large set offline) | ~5% | ~2.5e15 | ~0.25%
|
| Manual `force_inject_slash` at 20% | 20% | 1e16 | 1% |
| Manual `force_inject_slash` at 100% | 100% | 5e16 | 5% (max) |
The same WAD value is applied uniformly to all configured strategies via
the `SlashingRequest` struct sent through Snowbridge to
`DatahavenServiceManager.slashValidatorsOperator()`.
### E2E liveness slashing test
New test scenario (`should detect and slash an unresponsive validator`)
validates the full liveness detection pipeline:
1. Pauses bob's Docker container (preserving GRANDPA state via `docker
pause`)
2. Waits 200s (>= 2 full sessions) for `pallet_im_online` to detect
missed heartbeats
3. Unpauses bob to restore GRANDPA finality (2/2 validators needed)
4. Polls for `SlashReported` event (not `Slashes` storage — see slash
fraction note above)
5. Verifies the event confirms the full pipeline: `pallet_im_online ->
EquivocationReportWrapper -> pallet_offences -> on_offence`
The test uses `try/finally` to always unpause bob, `{ at: "best" }`
queries for non-finalized chain state during the pause, and drains prior
`SlashReported` events before starting.
### Tests
- **10 new unit tests**: `PendingOffenceKind` double-map semantics,
session isolation, wrapper historical filtering, error cleanup, WAD
conversion (100%, 50%, 0%), offence kind description propagation
- **New mock infrastructure**: `MockInnerReporter`, `MockOffence`,
`MockOkOutboundQueue` with slash data capture
- **E2E**: Updated `force_inject_slash` test to use `offence_kind` enum,
new liveness detection test
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Gonza Montiel <gonzamontiel@users.noreply.github.com>
Co-authored-by: undercover-cactus <lola@moonsonglabs.com>
|
||
|---|---|---|
| .github | ||
| contracts | ||
| deploy | ||
| docker | ||
| operator | ||
| resources | ||
| specs | ||
| test | ||
| tools | ||
| .gitignore | ||
| .gitmodules | ||
| biome.json | ||
| CLAUDE.md | ||
| file_header.txt | ||
| LICENSE | ||
| README.md | ||
| taplo.toml | ||
DataHaven 🫎
AI-First Decentralized Storage secured by EigenLayer — a verifiable storage network for AI training data, machine learning models, and Web3 applications.
Overview
DataHaven is a decentralized storage and retrieval network designed for applications that need verifiable, production-scale data storage. Built on StorageHub and secured by EigenLayer's restaking protocol, DataHaven separates storage from verification: providers store data off-chain while cryptographic commitments are anchored on-chain for tamper-evident verification.
Core Capabilities:
- Verifiable Storage: Files are chunked, hashed into Merkle trees, and committed on-chain — enabling cryptographic proof that data hasn't been tampered with
- Provider Network: Main Storage Providers (MSPs) serve data with competitive offerings, while Backup Storage Providers (BSPs) ensure redundancy through decentralized replication with on-chain slashing for failed proof challenges
- EigenLayer Security: Validator set secured by Ethereum restaking — DataHaven validators register as EigenLayer operators with slashing for misbehavior
- EVM Compatibility: Full Ethereum support via Frontier pallets for smart contracts and familiar Web3 tooling
- Cross-chain Bridge: Native, trustless bridging with Ethereum via Snowbridge for tokens and messages
Architecture
DataHaven combines EigenLayer's shared security with StorageHub's decentralized storage infrastructure:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Ethereum (L1) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ EigenLayer AVS Contracts │ │
│ │ • DataHavenServiceManager (validator lifecycle & slashing) │ │
│ │ • RewardsRegistry (validator performance & rewards) │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ ↕ │
│ Snowbridge Protocol │
│ (trustless cross-chain messaging) │
└─────────────────────────────────────────────────────────────────────────────┘
↕
┌─────────────────────────────────────────────────────────────────────────────┐
│ DataHaven (Substrate) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ StorageHub Pallets DataHaven Pallets │ │
│ │ • file-system (file operations) • External Validators │ │
│ │ • providers (MSP/BSP registry) • Native Transfer │ │
│ │ • proofs-dealer (challenge/verify) • Rewards │ │
│ │ • payment-streams (storage payments) • Frontier (EVM) │ │
│ │ • bucket-nfts (bucket ownership) │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
↕
┌─────────────────────────────────────────────────────────────────────────────┐
│ Storage Provider Network │
│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │
│ │ Main Storage Providers │ │ Backup Storage Providers │ │
│ │ (MSP) │ │ (BSP) │ │
│ │ • User-selected │ │ • Network-assigned │ │
│ │ • Serve read requests │ │ • Replicate data │ │
│ │ • Anchor bucket roots │ │ • Proof challenges │ │
│ │ • MSP Backend service │ │ • On-chain slashing │ │
│ └─────────────────────────────┘ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │
│ │ Indexer │ │ Fisherman │ │
│ │ • Index on-chain events │ │ • Audit storage proofs │ │
│ │ • Query storage metadata │ │ • Trigger challenges │ │
│ │ • PostgreSQL backend │ │ • Detect misbehavior │ │
│ └─────────────────────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
How Storage Works
- Upload: User selects an MSP, creates a bucket, and uploads files. Files are chunked (8KB default), hashed into Merkle trees, and the root is anchored on-chain.
- Replication: The MSP coordinates with BSPs to replicate data across the network based on the bucket's replication policy.
- Retrieval: MSP returns files with Merkle proofs that users verify against on-chain commitments.
- Verification: BSPs face periodic proof challenges — failure to prove data custody results in on-chain slashing via StorageHub pallets.
Repository Structure
datahaven/
├── contracts/ # EigenLayer AVS smart contracts
│ ├── src/ # Service Manager, Rewards Registry, Slasher
│ ├── script/ # Deployment scripts
│ └── test/ # Foundry test suites
├── operator/ # Substrate-based DataHaven node
│ ├── node/ # Node implementation & chain spec
│ ├── pallets/ # Custom pallets (validators, rewards, transfers)
│ └── runtime/ # Runtime configurations (mainnet/stagenet/testnet)
├── test/ # E2E testing framework
│ ├── suites/ # Integration test scenarios
│ ├── framework/ # Test utilities and helpers
│ └── launcher/ # Network deployment automation
├── deploy/ # Kubernetes deployment charts
│ ├── charts/ # Helm charts for nodes and relayers
│ └── environments/ # Environment-specific configurations
├── tools/ # GitHub automation and release scripts
└── .github/ # CI/CD workflows
Each directory contains its own README with detailed information. See:
- contracts/README.md - Smart contract development
- operator/README.md - Node building and runtime development
- test/README.md - E2E testing and network deployment
- deploy/README.md - Kubernetes deployment
- tools/README.md - Development tools
Quick Start
Prerequisites
- Kurtosis - Network orchestration
- Bun v1.3.2+ - TypeScript runtime
- Docker - Container management
- Foundry - Solidity toolkit
- Rust - For building the operator
- Helm - Kubernetes deployments (optional)
- Zig - For macOS cross-compilation (macOS only)
Launch Local Network
The fastest way to get started is with the interactive CLI:
cd test
bun i # Install dependencies
bun cli launch # Interactive launcher with prompts
This deploys a complete environment including:
- Ethereum network: 2x EL clients (reth), 2x CL clients (lodestar)
- Block explorers: Blockscout (optional), Dora consensus explorer
- DataHaven node: Single validator with fast block times
- Storage providers: MSP and BSP nodes for decentralized storage
- AVS contracts: Deployed and configured on Ethereum
- Snowbridge relayers: Bidirectional message passing
For more options and detailed instructions, see the test README.
Run Tests
cd test
bun test:e2e # Run all integration tests
bun test:e2e:parallel # Run with limited concurrency
NOTES: Adding the environment variable INJECT_CONTRACTS=true will inject the contracts when starting the tests to speed up setup.
Development Workflows
Smart Contract Development:
cd contracts
forge build # Compile contracts
forge test # Run contract tests
Node Development:
cd operator
cargo build --release --features fast-runtime
cargo test
./scripts/run-benchmarks.sh
After Making Changes:
cd test
bun generate:wagmi # Regenerate contract bindings
bun generate:types # Regenerate runtime types
Key Features
Verifiable Decentralized Storage
Production-scale storage with cryptographic guarantees:
- Buckets: User-created containers managed by an MSP, summarized by a Merkle-Patricia trie root on-chain
- Files: Deterministically chunked, hashed into Merkle trees, with roots serving as immutable fingerprints
- Proofs: Merkle proofs enable verification of data integrity without trusting intermediaries
- Audits: BSPs prove ongoing data custody via randomized proof challenges
Storage Provider Network
Two-tier provider model balancing performance and reliability:
- MSPs: User-selected providers offering data retrieval with competitive service offerings
- BSPs: Network-assigned backup providers ensuring data redundancy and availability, with on-chain slashing for failed proof challenges
- Fisherman: Auditing service that monitors proofs and triggers challenges for misbehavior
- Indexer: Indexes on-chain storage events for efficient querying
EigenLayer Security
DataHaven validators secured through Ethereum restaking:
- Validators register as operators via
DataHavenServiceManagercontract - Economic security through ETH restaking
- Slashing for validator misbehavior (separate from BSP slashing which is on-chain)
- Performance-based validator rewards through
RewardsRegistry
EVM Compatibility
Full Ethereum Virtual Machine support via Frontier pallets:
- Deploy Solidity smart contracts
- Use existing Ethereum tooling (MetaMask, Hardhat, etc.)
- Compatible with ERC-20, ERC-721, and other standards
Cross-chain Communication
Trustless bridging via Snowbridge:
- Native token transfers between Ethereum ↔ DataHaven
- Cross-chain message passing
- Finality proofs via BEEFY consensus
- Three specialized relayers (beacon, BEEFY, execution)
Use Cases
DataHaven is designed for applications requiring verifiable, tamper-proof data storage:
- AI & Machine Learning: Store training datasets, model weights, and agent configurations with cryptographic proofs of integrity — enabling federated learning and verifiable AI pipelines
- DePIN (Decentralized Physical Infrastructure): Persistent storage for IoT sensor data, device configurations, and operational logs with provable data lineage
- Real World Assets (RWAs): Immutable storage for asset documentation, ownership records, and compliance data with on-chain verification
Docker Images
Production images published to DockerHub.
Build optimizations:
- sccache - Rust compilation caching
- cargo-chef - Dependency layer caching
- BuildKit cache mounts - External cache restoration
Build locally:
cd test
bun build:docker:operator # Creates datahavenxyz/datahaven:local
Development Environment
VS Code Configuration
IDE configurations are excluded from version control for personalization, but these settings are recommended for optimal developer experience. Add to your .vscode/settings.json:
Rust Analyzer:
{
"rust-analyzer.linkedProjects": ["./operator/Cargo.toml"],
"rust-analyzer.cargo.allTargets": true,
"rust-analyzer.procMacro.enable": false,
"rust-analyzer.server.extraEnv": {
"CARGO_TARGET_DIR": "target/.rust-analyzer",
"SKIP_WASM_BUILD": 1
},
"rust-analyzer.diagnostics.disabled": ["unresolved-macro-call"],
"rust-analyzer.cargo.buildScripts.enable": false
}
Optimizations:
- Links
operator/directory as the primary Rust project - Disables proc macros and build scripts for faster analysis (Substrate macros are slow)
- Uses dedicated target directory to avoid conflicts
- Skips WASM builds during development
Solidity (Juan Blanco's extension):
{
"solidity.formatter": "forge",
"solidity.compileUsingRemoteVersion": "v0.8.28+commit.7893614a",
"[solidity]": {
"editor.defaultFormatter": "JuanBlanco.solidity"
}
}
Note: Solidity version must match foundry.toml
TypeScript (Biome):
{
"biome.lsp.bin": "test/node_modules/.bin/biome",
"[typescript]": {
"editor.defaultFormatter": "biomejs.biome",
"editor.codeActionsOnSave": {
"source.organizeImports.biome": "always"
}
}
}
CI/CD
Local CI Testing
Run GitHub Actions workflows locally using act:
# Run E2E workflow
act -W .github/workflows/e2e.yml -s GITHUB_TOKEN="$(gh auth token)"
# Run specific job
act -W .github/workflows/e2e.yml -j test-job-name
Automated Workflows
The repository includes GitHub Actions for:
- E2E Testing: Full integration tests on PR and main branch
- Contract Testing: Foundry test suites for smart contracts
- Rust Testing: Unit and integration tests for operator
- Docker Builds: Multi-platform image builds with caching
- Release Automation: Version tagging and changelog generation
See .github/workflows/ for workflow definitions.
Contributing
Development Cycle
- Make Changes: Edit contracts, runtime, or tests
- Run Tests: Component-specific tests (
forge test,cargo test) - Regenerate Types: Update bindings if contracts/runtime changed
- Integration Test: Run E2E tests to verify cross-component behavior
- Code Quality: Format and lint (
cargo fmt,forge fmt,bun fmt:fix)
Common Pitfalls
- Type mismatches: Regenerate with
bun generate:typesafter runtime changes - Contract changes not reflected: Run
bun generate:wagmiafter modifications - Kurtosis issues: Ensure Docker is running and Kurtosis engine is started
- Slow development: Use
--features fast-runtimefor shorter epochs/eras (block time stays 6s) - Network launch hangs: Check Blockscout - forge output can appear frozen
See CLAUDE.md for detailed development guidance.
License
GPL-3.0 - See LICENSE file for details