mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
|
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + Elasticsearch + Redis / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + Elasticsearch + Redis / integration-tests-postgres-elasticsearch-redis (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* docs(ingestion): design for runtime diagnostics subsystem Proposal for an always-available, opt-in (loggerLevel=DEBUG) diagnostics layer inside the ingestion framework so connector runs that hang, OOM, or slow down produce enough live evidence to identify the root cause in `kubectl logs` — without `py-spy`, `kubectl debug`, or ptrace. Grounded in three concrete production cases: - The Snowflake "hang" that was actually a logging recursion bug in StreamableLogHandler (fixed by PR #28160) but took ~6 hours and one wrong-theory fix to identify. - Recurring OOMKills with no last-state evidence and no way to attribute growth to a specific object type or stage. - "Is it stuck or just slow?" with no way to answer from outside the pod. The design is gated entirely on the existing `workflowConfig.loggerLevel` (no new env vars, no new config fields). When off, the module is dead code. When on (~250 KB / <0.01% CPU), it provides: - An operation registry of "what each thread is doing right now" - SIGUSR1 / SIGUSR2 handlers for on-demand dumps to stderr - A watchdog thread that auto-logs hangs at 60s and auto-dumps at 300s - A heartbeat thread emitting one structured progress line every 30s - A memory tracker (RSS / cgroup / GC top-types on dump) - Stage-backpressure visibility (queue depths between source/processor/sink) - HTTP introspection of OMetaClient and DB cursor execute() |
||
|---|---|---|
| .. | ||
| design | ||