diff --git a/test/tools/validator-set-submitter/README.md b/test/tools/validator-set-submitter/README.md index 1be7c45e..a398d4ba 100644 --- a/test/tools/validator-set-submitter/README.md +++ b/test/tools/validator-set-submitter/README.md @@ -1,6 +1,6 @@ # Validator Set Submitter -Long-running daemon that automatically submits validator-set updates from Ethereum to DataHaven each era via Snowbridge. +Daemon process that automatically submits validator-set updates from Ethereum to DataHaven each era via Snowbridge. ## How it works @@ -11,7 +11,14 @@ The submitter subscribes to finalized `Session.CurrentIndex` changes on DataHave 3. Is `ExternalIndex` already at or past `targetEra`? 4. Is the current session the last session of the era? -If all preconditions are met, it calls `sendNewValidatorSetForEra` on the ServiceManager contract. Each era gets a single submission attempt — if it fails, the era is missed and the submitter moves on to the next. +If all preconditions are met, it calls `sendNewValidatorSetForEra` on the ServiceManager contract. Submission attempt tracking is in-memory, so each era gets a single submission attempt per process run. If an attempt fails, that era is marked missed for this run and the submitter moves on to the next era. + +### Runtime and restart behavior + +- The submitter does not implement automatic reconnect/backoff for DataHaven session-subscription failures. +- On a subscription error, it logs the error, stops the watcher, and the process exits. +- Run it under a restart policy (for example `systemd` with `Restart=always` or Kubernetes with `restartPolicy: Always`). +- After a restart, a previously failed era may be attempted again if `ExternalIndex` has not advanced past that target era. ## Prerequisites @@ -44,6 +51,47 @@ relayer_fee: "0.2" # metrics_port: 8080 ``` +### Settings reference + +| Field | Type | Required | Default | Description | +|---|---|---|---|---| +| `ethereum_rpc_url` | string | Yes | — | Ethereum JSON-RPC endpoint | +| `datahaven_ws_url` | string | Yes | — | DataHaven WebSocket endpoint | +| `submitter_private_key` | hex string | No\* | — | Private key of the authorized submitter account (`0x` + 64 hex chars) | +| `network_id` | string | No | `"anvil"` | Network ID used to locate `contracts/deployments/{network_id}.json` | +| `service_manager_address` | hex address | No\*\* | — | ServiceManager contract address | +| `execution_fee` | string (ETH) | No | `"0.1"` | Snowbridge execution fee sent as `msg.value` | +| `relayer_fee` | string (ETH) | No | `"0.2"` | Snowbridge relayer fee sent as `msg.value` | +| `metrics_port` | integer | No | `8080` | Prometheus metrics server port (1–65535) | + +\* Required via one of: `--submitter-private-key` flag, `SUBMITTER_PRIVATE_KEY` env var, or `submitter_private_key` in config. +\*\* Required when running in Docker (deployment files are not included in the image). When omitted, the address is read from `contracts/deployments/{network_id}.json`. + +### Private key precedence + +The submitter private key is resolved in this order (first wins): + +1. `--submitter-private-key` CLI flag +2. `SUBMITTER_PRIVATE_KEY` environment variable +3. `submitter_private_key` in the config YAML file + +### Environment variables + +| Variable | Description | +|---|---| +| `SUBMITTER_PRIVATE_KEY` | Submitter private key (see precedence above) | +| `METRICS_PORT` | Override metrics port (takes precedence over config file, but CLI flag wins) | +| `LOG_LEVEL` | Log verbosity: `debug`, `info` (default), `warn`, `error` | + +### CLI flags + +| Flag | Description | +|---|---| +| `--config ` | Path to YAML config file (default: `./tools/validator-set-submitter/config.yml`) | +| `--submitter-private-key ` | Override submitter private key | +| `--metrics-port ` | Override metrics server port | +| `--dry-run` | Log what would be submitted without sending transactions | + ## Usage From the `test/` directory: @@ -65,54 +113,127 @@ bun tools/validator-set-submitter/main.ts run --submitter-private-key 0x... bun tools/validator-set-submitter/main.ts run --dry-run ``` -Private key precedence is: `--submitter-private-key` > `SUBMITTER_PRIVATE_KEY` > `submitter_private_key` in config file. - ## Observability -The submitter exposes a Prometheus metrics server on `metrics_port` (default `8080`): +The submitter exposes an HTTP server on `metrics_port` (default `8080`) with three endpoints: -- `GET /metrics` — Prometheus metrics -- `GET /healthz` — liveness -- `GET /readyz` — readiness (`200` once startup checks pass and watcher is running) +| Endpoint | Purpose | Codes | +|---|---|---| +| `GET /metrics` | Prometheus metrics scrape | `200` | +| `GET /healthz` | Liveness probe | `200` always | +| `GET /readyz` | Readiness probe | `200` when startup checks passed and watcher is running, `503` otherwise | -Key metrics: +### Metrics reference -- `validator_set_submitter_submissions_total{outcome="success|failed|dry_run"}` -- `validator_set_submitter_ticks_total{result="submitted_success|submitted_failed|skipped_*"}` -- `validator_set_submitter_errors_total{type="tick_error|subscription_error"}` -- `validator_set_submitter_missed_eras_total` -- `validator_set_submitter_consecutive_missed_eras` -- `validator_set_submitter_up` -- `validator_set_submitter_ready` +All metrics are prefixed with `validator_set_submitter_`. + +#### Counters + +| Metric | Labels | Description | +|---|---|---| +| `submissions_total` | `outcome`: `success`, `failed`, `dry_run` | Total submission attempts by result | +| `ticks_total` | `result`: `submitted_success`, `submitted_failed`, `skipped_no_active_era`, `skipped_already_submitted`, `skipped_already_confirmed`, `skipped_not_last_session` | Tick evaluation outcomes | +| `errors_total` | `type`: `tick_error`, `subscription_error` | Non-submission errors | +| `missed_eras_total` | — | Total eras where the submission attempt failed | + +#### Gauges + +| Metric | Description | +|---|---| +| `active_era` | Current active era on DataHaven | +| `target_era` | Target era for next submission (`active_era + 1`) | +| `external_index` | Latest confirmed era on-chain | +| `current_session` | Current session number | +| `last_submitted_era` | Last era successfully submitted | +| `consecutive_missed_eras` | Consecutive missed eras (resets to 0 on success) | +| `up` | `1` if watcher is running, `0` if stopped | +| `ready` | `1` if startup checks passed and watcher running, `0` otherwise | + +#### Histograms + +| Metric | Buckets | Description | +|---|---|---| +| `submission_duration_seconds` | 1, 5, 10, 30, 60, 120, 300 | Time from transaction send to receipt | +| `tick_duration_seconds` | 0.1, 0.5, 1, 2, 5, 10, 30 | Time to process one tick | + +### Alerting recommendations + +Example Prometheus alert rules for common failure modes: + +```yaml +groups: + - name: validator-set-submitter + rules: + - alert: SubmitterDown + expr: validator_set_submitter_up == 0 + for: 2m + labels: + severity: critical + annotations: + summary: "Validator set submitter is down" + + - alert: ConsecutiveMissedEras + expr: validator_set_submitter_consecutive_missed_eras > 0 + for: 0m + labels: + severity: critical + annotations: + summary: "Submitter has missed {{ $value }} consecutive era(s)" + + - alert: SubmissionErrorsIncreasing + expr: rate(validator_set_submitter_errors_total[5m]) > 0 + for: 5m + labels: + severity: warning + annotations: + summary: "Submitter errors increasing (type={{ $labels.type }})" + + - alert: SlowSubmissions + expr: histogram_quantile(0.95, rate(validator_set_submitter_submission_duration_seconds_bucket[15m])) > 120 + for: 5m + labels: + severity: warning + annotations: + summary: "95th percentile submission duration exceeds 120s" +``` ## Docker -Build the image from the repository root: +A pre-built image is published to Docker Hub on every push to `main`: -```bash -docker build -f test/tools/validator-set-submitter/Dockerfile \ - -t datahavenxyz/validator-set-submitter:local . +``` +datahavenxyz/validator-set-submitter:latest +datahavenxyz/validator-set-submitter:sha- ``` -Run the submitter with mounted config and env private key: +Run the submitter with a mounted config and private key: ```bash docker run --rm \ - -v "$(pwd)/test/tools/validator-set-submitter/config.yml:/config/config.yml:ro" \ + -v "$(pwd)/config.yml:/config/config.yml:ro" \ -e SUBMITTER_PRIVATE_KEY=0x... \ - datahavenxyz/validator-set-submitter:local + datahavenxyz/validator-set-submitter:latest ``` Dry run: ```bash docker run --rm \ - -v "$(pwd)/test/tools/validator-set-submitter/config.yml:/config/config.yml:ro" \ + -v "$(pwd)/config.yml:/config/config.yml:ro" \ -e SUBMITTER_PRIVATE_KEY=0x... \ - datahavenxyz/validator-set-submitter:local --dry-run + datahavenxyz/validator-set-submitter:latest --dry-run ``` -The Docker image does not include `contracts/deployments/*.json`. In containerized runs, set `service_manager_address` in your config. +The Docker image does not include `contracts/deployments/*.json`. Set `service_manager_address` explicitly in your config. + +### Building locally + +To build the image from the repository root: + +```bash +docker build -f test/tools/validator-set-submitter/Dockerfile \ + -t datahavenxyz/validator-set-submitter:local . +``` ## Startup checks @@ -127,3 +248,42 @@ If any check fails, the process exits immediately. ## Shutdown Send `SIGINT` (Ctrl+C) or `SIGTERM`. The submitter unsubscribes from session changes and tears down connections cleanly. + +## Troubleshooting + +### Startup exits immediately + +| Symptom | Cause | Fix | +|---|---|---| +| `Cannot connect to Ethereum RPC` | Ethereum endpoint unreachable | Verify `ethereum_rpc_url` is correct and the node is running | +| `Cannot connect to DataHaven WS` | DataHaven endpoint unreachable | Verify `datahaven_ws_url` is correct and the node accepts WebSocket connections | +| `Account 0x... is not the authorized submitter` | Private key does not match the on-chain submitter | Call `setValidatorSetSubmitter` on the ServiceManager with the correct address, or fix the private key | +| `Missing submitter private key` | No key provided | Supply via `--submitter-private-key`, `SUBMITTER_PRIVATE_KEY` env var, or `submitter_private_key` in config | +| `Config file not found` | Wrong `--config` path | Check the path and ensure the file exists | + +### Missed eras + +When the submitter fails to submit for an era, `missed_eras_total` increments and `consecutive_missed_eras` increases. Common causes: + +- **Transaction reverted** — the submitter account may have insufficient ETH to cover `execution_fee + relayer_fee`. Fund the account. +- **RPC timeout** — the Ethereum RPC may be overloaded or unreachable. Check RPC health and consider a dedicated endpoint. +- **Snowbridge congestion** — if the bridge queue is full, submissions may fail. Check Snowbridge relayer status. +- **Already confirmed** — if another process submitted the era, the submitter skips it (this is normal, not an error). + +Check `LOG_LEVEL=debug` output for detailed tick-by-tick reasoning. + +### Process exits after running for a while + +| Symptom | Cause | Fix | +|---|---|---| +| `Session subscription error: ...` followed by process exit | DataHaven WebSocket subscription dropped and the submitter has no built-in reconnect loop | Ensure WebSocket stability and run the submitter with automatic restarts (`systemd`/Kubernetes) | + +### Enabling debug logs + +Set the `LOG_LEVEL` environment variable to `debug` for verbose output: + +```bash +LOG_LEVEL=debug bun tools/validator-set-submitter/main.ts run +``` + +Or in Docker/Kubernetes, add `LOG_LEVEL: "debug"` to the environment. Debug logs include per-tick skip reasons and detailed transaction information.