# DataHaven 🫎 AI-First Decentralized Storage secured by EigenLayer — a verifiable storage network for AI training data, machine learning models, and Web3 applications. ## Overview DataHaven is a decentralized storage and retrieval network designed for applications that need verifiable, production-scale data storage. Built on [StorageHub](https://github.com/Moonsong-Labs/storage-hub) and secured by EigenLayer's restaking protocol, DataHaven separates storage from verification: providers store data off-chain while cryptographic commitments are anchored on-chain for tamper-evident verification. **Core Capabilities:** - **Verifiable Storage**: Files are chunked, hashed into Merkle trees, and committed on-chain — enabling cryptographic proof that data hasn't been tampered with - **Provider Network**: Main Storage Providers (MSPs) serve data with competitive offerings, while Backup Storage Providers (BSPs) ensure redundancy through decentralized replication with on-chain slashing for failed proof challenges - **EigenLayer Security**: Validator set secured by Ethereum restaking — DataHaven validators register as EigenLayer operators with slashing for misbehavior - **EVM Compatibility**: Full Ethereum support via Frontier pallets for smart contracts and familiar Web3 tooling - **Cross-chain Bridge**: Native, trustless bridging with Ethereum via Snowbridge for tokens and messages ## Architecture DataHaven combines EigenLayer's shared security with StorageHub's decentralized storage infrastructure: ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Ethereum (L1) │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ EigenLayer AVS Contracts │ │ │ │ • DataHavenServiceManager (validator lifecycle & slashing) │ │ │ │ • RewardsRegistry (validator performance & rewards) │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ ↕ │ │ Snowbridge Protocol │ │ (trustless cross-chain messaging) │ └─────────────────────────────────────────────────────────────────────────────┘ ↕ ┌─────────────────────────────────────────────────────────────────────────────┐ │ DataHaven (Substrate) │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ StorageHub Pallets DataHaven Pallets │ │ │ │ • file-system (file operations) • External Validators │ │ │ │ • providers (MSP/BSP registry) • Native Transfer │ │ │ │ • proofs-dealer (challenge/verify) • Rewards │ │ │ │ • payment-streams (storage payments) • Frontier (EVM) │ │ │ │ • bucket-nfts (bucket ownership) │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘ ↕ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Storage Provider Network │ │ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │ │ │ Main Storage Providers │ │ Backup Storage Providers │ │ │ │ (MSP) │ │ (BSP) │ │ │ │ • User-selected │ │ • Network-assigned │ │ │ │ • Serve read requests │ │ • Replicate data │ │ │ │ • Anchor bucket roots │ │ • Proof challenges │ │ │ │ • MSP Backend service │ │ • On-chain slashing │ │ │ └─────────────────────────────┘ └─────────────────────────────┘ │ │ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │ │ │ Indexer │ │ Fisherman │ │ │ │ • Index on-chain events │ │ • Audit storage proofs │ │ │ │ • Query storage metadata │ │ • Trigger challenges │ │ │ │ • PostgreSQL backend │ │ • Detect misbehavior │ │ │ └─────────────────────────────┘ └─────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### How Storage Works 1. **Upload**: User selects an MSP, creates a bucket, and uploads files. Files are chunked (8KB default), hashed into Merkle trees, and the root is anchored on-chain. 2. **Replication**: The MSP coordinates with BSPs to replicate data across the network based on the bucket's replication policy. 3. **Retrieval**: MSP returns files with Merkle proofs that users verify against on-chain commitments. 4. **Verification**: BSPs face periodic proof challenges — failure to prove data custody results in on-chain slashing via StorageHub pallets. ## Repository Structure ``` datahaven/ ├── contracts/ # EigenLayer AVS smart contracts │ ├── src/ # Service Manager, Rewards Registry, Slasher │ ├── script/ # Deployment scripts │ └── test/ # Foundry test suites ├── operator/ # Substrate-based DataHaven node │ ├── node/ # Node implementation & chain spec │ ├── pallets/ # Custom pallets (validators, rewards, transfers) │ └── runtime/ # Runtime configurations (mainnet/stagenet/testnet) ├── test/ # E2E testing framework │ ├── suites/ # Integration test scenarios │ ├── framework/ # Test utilities and helpers │ └── launcher/ # Network deployment automation ├── deploy/ # Kubernetes deployment charts │ ├── charts/ # Helm charts for nodes and relayers │ └── environments/ # Environment-specific configurations ├── tools/ # GitHub automation and release scripts └── .github/ # CI/CD workflows ``` Each directory contains its own README with detailed information. See: - [contracts/README.md](contracts/README.md) - Smart contract development - [operator/README.md](operator/README.md) - Node building and runtime development - [test/README.md](test/README.md) - E2E testing and network deployment - [deploy/README.md](deploy/README.md) - Kubernetes deployment - [tools/README.md](tools/README.md) - Development tools ## Quick Start ### Prerequisites - [Kurtosis](https://docs.kurtosis.com/install) - Network orchestration - [Bun](https://bun.sh/) v1.3.2+ - TypeScript runtime - [Docker](https://www.docker.com/) - Container management - [Foundry](https://getfoundry.sh/) - Solidity toolkit - [Rust](https://www.rust-lang.org/tools/install) - For building the operator - [Helm](https://helm.sh/) - Kubernetes deployments (optional) - [Zig](https://ziglang.org/) - For macOS cross-compilation (macOS only) ### Launch Local Network The fastest way to get started is with the interactive CLI: ```bash cd test bun i # Install dependencies bun cli launch # Interactive launcher with prompts ``` This deploys a complete environment including: - **Ethereum network**: 2x EL clients (reth), 2x CL clients (lodestar) - **Block explorers**: Blockscout (optional), Dora consensus explorer - **DataHaven node**: Single validator with fast block times - **Storage providers**: MSP and BSP nodes for decentralized storage - **AVS contracts**: Deployed and configured on Ethereum - **Snowbridge relayers**: Bidirectional message passing For more options and detailed instructions, see the [test README](./test/README.md). ### Run Tests ```bash cd test bun test:e2e # Run all integration tests bun test:e2e:parallel # Run with limited concurrency ``` NOTES: Adding the environment variable `INJECT_CONTRACTS=true` will inject the contracts when starting the tests to speed up setup. ### Development Workflows **Smart Contract Development**: ```bash cd contracts forge build # Compile contracts forge test # Run contract tests ``` **Node Development**: ```bash cd operator cargo build --release --features fast-runtime cargo test ./scripts/run-benchmarks.sh ``` **After Making Changes**: ```bash cd test bun generate:wagmi # Regenerate contract bindings bun generate:types # Regenerate runtime types ``` ## Key Features ### Verifiable Decentralized Storage Production-scale storage with cryptographic guarantees: - **Buckets**: User-created containers managed by an MSP, summarized by a Merkle-Patricia trie root on-chain - **Files**: Deterministically chunked, hashed into Merkle trees, with roots serving as immutable fingerprints - **Proofs**: Merkle proofs enable verification of data integrity without trusting intermediaries - **Audits**: BSPs prove ongoing data custody via randomized proof challenges ### Storage Provider Network Two-tier provider model balancing performance and reliability: - **MSPs**: User-selected providers offering data retrieval with competitive service offerings - **BSPs**: Network-assigned backup providers ensuring data redundancy and availability, with on-chain slashing for failed proof challenges - **Fisherman**: Auditing service that monitors proofs and triggers challenges for misbehavior - **Indexer**: Indexes on-chain storage events for efficient querying ### EigenLayer Security DataHaven validators secured through Ethereum restaking: - Validators register as operators via `DataHavenServiceManager` contract - Economic security through ETH restaking - Slashing for validator misbehavior (separate from BSP slashing which is on-chain) - Performance-based validator rewards through `RewardsRegistry` ### EVM Compatibility Full Ethereum Virtual Machine support via Frontier pallets: - Deploy Solidity smart contracts - Use existing Ethereum tooling (MetaMask, Hardhat, etc.) - Compatible with ERC-20, ERC-721, and other standards ### Cross-chain Communication Trustless bridging via Snowbridge: - Native token transfers between Ethereum ↔ DataHaven - Cross-chain message passing - Finality proofs via BEEFY consensus - Three specialized relayers (beacon, BEEFY, execution) ## Use Cases DataHaven is designed for applications requiring verifiable, tamper-proof data storage: - **AI & Machine Learning**: Store training datasets, model weights, and agent configurations with cryptographic proofs of integrity — enabling federated learning and verifiable AI pipelines - **DePIN (Decentralized Physical Infrastructure)**: Persistent storage for IoT sensor data, device configurations, and operational logs with provable data lineage - **Real World Assets (RWAs)**: Immutable storage for asset documentation, ownership records, and compliance data with on-chain verification ## Docker Images Production images published to [DockerHub](https://hub.docker.com/r/datahavenxyz/datahaven). **Build optimizations**: - [sccache](https://github.com/mozilla/sccache) - Rust compilation caching - [cargo-chef](https://lpalmieri.com/posts/fast-rust-docker-builds/) - Dependency layer caching - [BuildKit cache mounts](https://docs.docker.com/build/cache/optimize/#use-cache-mounts) - External cache restoration **Build locally**: ```bash cd test bun build:docker:operator # Creates datahavenxyz/datahaven:local ``` ## Development Environment ### VS Code Configuration IDE configurations are excluded from version control for personalization, but these settings are recommended for optimal developer experience. Add to your `.vscode/settings.json`: **Rust Analyzer**: ```json { "rust-analyzer.linkedProjects": ["./operator/Cargo.toml"], "rust-analyzer.cargo.allTargets": true, "rust-analyzer.procMacro.enable": false, "rust-analyzer.server.extraEnv": { "CARGO_TARGET_DIR": "target/.rust-analyzer", "SKIP_WASM_BUILD": 1 }, "rust-analyzer.diagnostics.disabled": ["unresolved-macro-call"], "rust-analyzer.cargo.buildScripts.enable": false } ``` Optimizations: - Links `operator/` directory as the primary Rust project - Disables proc macros and build scripts for faster analysis (Substrate macros are slow) - Uses dedicated target directory to avoid conflicts - Skips WASM builds during development **Solidity** ([Juan Blanco's extension](https://marketplace.visualstudio.com/items?itemName=JuanBlanco.solidity)): ```json { "solidity.formatter": "forge", "solidity.compileUsingRemoteVersion": "v0.8.28+commit.7893614a", "[solidity]": { "editor.defaultFormatter": "JuanBlanco.solidity" } } ``` Note: Solidity version must match [foundry.toml](./contracts/foundry.toml) **TypeScript** ([Biome](https://github.com/biomejs/biome)): ```json { "biome.lsp.bin": "test/node_modules/.bin/biome", "[typescript]": { "editor.defaultFormatter": "biomejs.biome", "editor.codeActionsOnSave": { "source.organizeImports.biome": "always" } } } ``` ## CI/CD ### Local CI Testing Run GitHub Actions workflows locally using [act](https://github.com/nektos/act): ```bash # Run E2E workflow act -W .github/workflows/e2e.yml -s GITHUB_TOKEN="$(gh auth token)" # Run specific job act -W .github/workflows/e2e.yml -j test-job-name ``` ### Automated Workflows The repository includes GitHub Actions for: - **E2E Testing**: Full integration tests on PR and main branch - **Contract Testing**: Foundry test suites for smart contracts - **Rust Testing**: Unit and integration tests for operator - **Docker Builds**: Multi-platform image builds with caching - **Release Automation**: Version tagging and changelog generation See `.github/workflows/` for workflow definitions. ## Contributing ### Development Cycle 1. **Make Changes**: Edit contracts, runtime, or tests 2. **Run Tests**: Component-specific tests (`forge test`, `cargo test`) 3. **Regenerate Types**: Update bindings if contracts/runtime changed 4. **Integration Test**: Run E2E tests to verify cross-component behavior 5. **Code Quality**: Format and lint (`cargo fmt`, `forge fmt`, `bun fmt:fix`) ### Common Pitfalls - **Type mismatches**: Regenerate with `bun generate:types` after runtime changes - **Contract changes not reflected**: Run `bun generate:wagmi` after modifications - **Kurtosis issues**: Ensure Docker is running and Kurtosis engine is started - **Slow development**: Use `--features fast-runtime` for shorter epochs/eras (block time stays 6s) - **Network launch hangs**: Check Blockscout - forge output can appear frozen See [CLAUDE.md](./CLAUDE.md) for detailed development guidance. ## License GPL-3.0 - See LICENSE file for details ## Links - [DataHaven Website](https://datahaven.xyz/) - [DataHaven Documentation](https://docs.datahaven.xyz/) - [StorageHub Repository](https://github.com/Moonsong-Labs/storage-hub) - [EigenLayer Documentation](https://docs.eigenlayer.xyz/) - [Substrate Documentation](https://docs.substrate.io/) - [Snowbridge Documentation](https://docs.snowbridge.network/) - [Foundry Book](https://book.getfoundry.sh/) - [Polkadot-API Documentation](https://papi.how/)