mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
* Add Prometheus metrics for reindexing pipeline via Micrometer Bridge the existing reindexing atomic counters to Prometheus so operators can alert on failures, latency spikes, and backpressure without relying solely on database-flushed stats.
- Add ReindexingMetrics singleton (initialize/getInstance pattern matching
CacheMetrics) with job lifecycle counters, stage success/failed/warnings
counters, bulk request timers with SLA buckets, payload size distribution,
backpressure and promotion counters, and active/pending gauges
- Register in MicrometerBundle after StreamableLogsMetrics
- Instrument ReindexingOrchestrator.run() with job started/completed/failed/stopped
- Bridge StageStatsTracker.flush() deltas to Prometheus per stage and entity type
- Add bulk request latency timer and payload size recording in OpenSearchBulkSink
- Record backpressure events in SearchIndexExecutor.handleBackpressure()
- Record promotion success/failure in DefaultRecreateHandler
- Add ReindexingMetricsTest with 24 tests covering all metric types
* Add Improvements
* Auto Gene
* Use Auto Config in distributed
* Fix Partition Claim Spread
* Make partition use config
* Correct total count
* Fix Wait time to 5 mins
* Revert om yaml
* Fix Sink sync
* Add Failure Handling at different stages
* Update script to create entities
* Move to scripts
* Add usage and fix script
* Fix Script
* Update generated TypeScript types
* Fix Staging miss
* Fix Stats reconcilation issue
* Revert workflow handler
* Fix Partition worker early sync
* Update Logs
* Update logs EntityRepository
* Error failure test
* Review Comments fix
* Fix Non Distributed live feed
* Fix Non Distributed stats feed
* Fix Review comments
* Fix Time Series cutt off
* Update generated TypeScript types
* Md
* Benchmark addition
* Fix date time warning
* Update load test to do benchmark analysis
* Disagnostic and update perf test
* Move load test to bin
* Fix Review Comments
* Add numeric values
* Move to localhost by default
* Fix Perf test issues
* Review Comments
* Add Preflight Fixes
* Add Preflight fixes for stale entry
* Remove stale entry on ApplicationHandler
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
||
|---|---|---|
| .. | ||
| config | ||
| local | ||
| docker-compose.yml | ||
| README.md | ||
Distributed Search Indexing Test Environment
This directory contains scripts and configurations to test the distributed search indexing feature with multiple OpenMetadata servers sharing a common database.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Test Environment │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ OM Server│ │ OM Server│ │ OM Server│ │
│ │ :8585 │ │ :8587 │ │ :8589 │ │
│ │ SERVER-1 │ │ SERVER-2 │ │ SERVER-3 │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └─────────────┼─────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ Polling │ (DB-based coordination) │
│ └──────┬──────┘ │
│ │ │
│ ┌─────────────┴─────────────┐ │
│ ▼ ▼ │
│ ┌─────────┐ ┌───────────┐ │
│ │ MySQL │ │ OpenSearch│ │
│ │ :3306 │ │ :9200 │ │
│ └─────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Quick Start (Docker Compose)
1. Start the Environment
cd docker/development/distributed-test
# Start all services (builds images on first run)
./scripts/start.sh
# Or force rebuild
./scripts/start.sh --build
2. Load Test Data
# Load 10,000 tables (default)
./scripts/perf-test.sh
# Or specify the number
./scripts/perf-test.sh --tables 50000 --databases 50
3. Trigger Reindexing
# Trigger reindex on server 1
./scripts/trigger-reindex.sh
# With index recreation
./scripts/trigger-reindex.sh --recreate
# Specific entities only
./scripts/trigger-reindex.sh --entities table,dashboard
4. Monitor Progress
# Follow logs from all servers
./scripts/logs.sh -f
# Filter by pattern
./scripts/logs.sh -f --grep "partition"
# Single server logs
./scripts/logs.sh -f --server 1
5. Stop the Environment
# Stop containers (preserve data)
./scripts/stop.sh
# Stop and clean up volumes
./scripts/stop.sh --clean
Local Development (IDE Debugging)
For debugging with breakpoints, run OM servers locally while using Docker for MySQL and OpenSearch.
1. Start Dependencies Only
cd docker/development/distributed-test
docker compose -f local/docker-compose-deps.yml up -d
2. Run Migrations (First Time)
cd /path/to/openmetadata
./bootstrap/openmetadata-ops.sh -d migrate --force
3. Option A: Run from Terminal
# Start all 3 servers in separate terminals
./local/run-local-servers.sh
# Or specific servers
./local/run-local-servers.sh 1 2
3. Option B: Run from IDE
Create run configurations in IntelliJ IDEA:
Server 1:
- Main class:
org.openmetadata.service.OpenMetadataApplication - Program arguments:
server docker/development/distributed-test/local/server1.yaml - VM options:
-Xmx1G -Xms512M - Working directory: Project root
Server 2:
- Same as above but with
local/server2.yaml
Server 3:
- Same as above but with
local/server3.yaml
Server Ports
| Server | API Port | Admin Port |
|---|---|---|
| Server 1 | 8585 | 8586 |
| Server 2 | 8587 | 8588 |
| Server 3 | 8589 | 8590 |
Configuration
Edit .env to customize:
# Number of tables for test data
TEST_DATA_TABLES=10000
# Log level
LOG_LEVEL=INFO
# Heap size per server
OPENMETADATA_HEAP_OPTS=-Xmx1G -Xms1G
Testing Distributed Indexing
Verify Partition Distribution
- Start all 3 servers
- Load test data:
./scripts/perf-test.sh --tables 10000 - Trigger reindex:
./scripts/trigger-reindex.sh --recreate - Watch logs:
./scripts/logs.sh -f --grep "partition"
You should see output like:
[SERVER-1] INFO Claimed partition: table_0-999 (1000 records)
[SERVER-2] INFO Claimed partition: table_1000-1999 (1000 records)
[SERVER-3] INFO Claimed partition: table_2000-2999 (1000 records)
[SERVER-1] INFO Completed partition: table_0-999
...
Test Server Failure Recovery
- Start reindexing with many partitions
- Stop one server mid-process:
docker stop distributed_test_om_server_2 - Watch remaining servers pick up orphaned partitions
- Verify job completes successfully
Check Job Status
# Via API
curl -s http://localhost:8585/api/v1/apps/name/SearchIndexingApplication/status | jq
# Check partition table directly
docker exec -it distributed_test_mysql mysql -uopenmetadata_user -popenmetadata_password openmetadata_db \
-e "SELECT status, COUNT(*) FROM search_index_partition GROUP BY status"
Troubleshooting
Servers Not Starting
Check if ports are in use:
lsof -i :8585
lsof -i :8587
lsof -i :8589
Database Connection Issues
Verify MySQL is accessible:
docker exec -it distributed_test_mysql mysql -uopenmetadata_user -popenmetadata_password -e "SELECT 1"
OpenSearch Not Ready
Check health:
curl http://localhost:9200/_cluster/health?pretty
View Full Logs
# All container logs
docker compose logs -f
# Specific container
docker logs -f distributed_test_om_server_1
File Structure
distributed-test/
├── docker-compose.yml # Full environment (3 OM servers + deps)
├── .env # Configuration variables
├── config/
│ └── mysql-init.sql # Database initialization
├── scripts/
│ ├── start.sh # Start full environment
│ ├── stop.sh # Stop environment
│ ├── logs.sh # View aggregated logs
│ ├── trigger-reindex.sh # Trigger reindexing
│ └── perf-test.sh # Load test data
├── local/
│ ├── docker-compose-deps.yml # Dependencies only (for IDE debugging)
│ ├── server1.yaml # Server 1 config (port 8585)
│ ├── server2.yaml # Server 2 config (port 8587)
│ ├── server3.yaml # Server 3 config (port 8589)
│ └── run-local-servers.sh # Start servers locally
└── README.md # This file