mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
* Update Perf * Add multi asset scale count * Update perf and Usage * Fix recommendation * Add Benchmarking script and doc * Fix Perf * Add --no break to benchmark * add more metrics and validation for indexes miss * Update generated TypeScript types * Bound Doc Virtual Threads * Remove Additional Properties from the UI * Update doc * Fix Job Getting Marked Stopped * Server killed logs fixes * Add Server stat to Quartz Progress * Fix CPU spiking * Make Auto Tune Consider JVm configs * Fix Partition Calculator and Recovery Job Stats * Update Auto Tune to show up in logs and stored in config * Fix Auto Tune Config not store in app run record * Fix OnDemand Job type * Indexing Failures not flushed fixed * Fix Stat counting at job level with process job failures * Add Reindex Job Identifier * Add Thread Identifiers * Wait for sink * Wait for sink * Fix Stopping to let partitions finish the job * CPU Budgeting * More Conservative settings * Address Review Comment * fix Open Search Index Manager * Reapply OpenSearch BulkSink --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
246 lines
8.9 KiB
Markdown
246 lines
8.9 KiB
Markdown
# Distributed Indexing Load Test Scripts
|
|
|
|
Scripts for generating test data and triggering reindexing to load-test the OpenMetadata search indexing pipeline.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Start the environment
|
|
./scripts/start.sh
|
|
|
|
# 2. Load test data (~50K entities)
|
|
./scripts/perf-test.sh --scale small --server http://localhost:8585
|
|
|
|
# 3. Trigger reindex
|
|
./scripts/trigger-reindex.sh
|
|
|
|
# 4. Monitor logs
|
|
./scripts/logs.sh
|
|
|
|
# 5. Stop the environment
|
|
./scripts/stop.sh
|
|
```
|
|
|
|
## perf-test.sh
|
|
|
|
Generates entities across 30+ entity types, including time-series data, lineage edges, and data quality entities. Uses concurrent workers for high throughput.
|
|
|
|
### Scale Presets
|
|
|
|
Use `--scale` to pick a preset:
|
|
|
|
| Preset | Approximate Total | Use Case |
|
|
|--------|-------------------|----------|
|
|
| `small` | ~50K | Quick smoke tests, CI |
|
|
| `medium` | ~500K | Integration testing |
|
|
| `large` | ~2M | Performance validation |
|
|
| `xlarge` | ~5M | Full-scale load testing |
|
|
|
|
```bash
|
|
# Small smoke test
|
|
./perf-test.sh --scale small --server http://localhost:8585
|
|
|
|
# Full 5M load test
|
|
./perf-test.sh --scale xlarge --server http://localhost:8585
|
|
|
|
# Quick mode (~10K, fastest)
|
|
./perf-test.sh --quick --server http://localhost:8585
|
|
```
|
|
|
|
Default (no `--scale` or `--quick`) produces ~46K entities for backward compatibility.
|
|
|
|
### Overriding Individual Counts
|
|
|
|
Any `--entity-type NUM` flag overrides the preset for that entity type:
|
|
|
|
```bash
|
|
# Small preset but with 100K tables
|
|
./perf-test.sh --scale small --tables 100000
|
|
|
|
# Only create tables and dashboards (everything else stays at preset counts)
|
|
./perf-test.sh --scale small --tables 50000 --dashboards 10000
|
|
```
|
|
|
|
### All Flags
|
|
|
|
#### Entity counts
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--tables NUM` | 20000 | Database tables |
|
|
| `--topics NUM` | 3000 | Kafka/messaging topics |
|
|
| `--dashboards NUM` | 5000 | Looker dashboards |
|
|
| `--charts NUM` | 10000 | Dashboard charts |
|
|
| `--pipelines NUM` | 3000 | Airflow pipelines |
|
|
| `--stored-procedures NUM` | 0 | Stored procedures |
|
|
| `--containers NUM` | 2000 | S3 containers |
|
|
| `--search-indexes NUM` | 1000 | Elasticsearch indexes |
|
|
| `--mlmodels NUM` | 2000 | ML models |
|
|
| `--queries NUM` | 0 | SQL queries |
|
|
| `--data-models NUM` | 0 | Dashboard data models |
|
|
| `--test-suites NUM` | 0 | Test suites |
|
|
| `--test-cases NUM` | 0 | Test cases (linked to tables) |
|
|
| `--glossaries NUM` | 50 | Glossaries |
|
|
| `--glossary-terms NUM` | 5000 | Glossary terms |
|
|
| `--classifications NUM` | 20 | Tag classifications |
|
|
| `--tags NUM` | 1000 | Tags |
|
|
| `--users NUM` | 0 | Users |
|
|
| `--teams NUM` | 0 | Teams |
|
|
| `--domains NUM` | 0 | Domains |
|
|
| `--data-products NUM` | 0 | Data products (need domains) |
|
|
| `--api-collections NUM` | 0 | API collections |
|
|
| `--api-endpoints NUM` | 0 | API endpoints (need collections) |
|
|
| `--lineage-edges NUM` | 0 | Lineage edges between entities |
|
|
|
|
#### Time-series entity counts
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--test-case-results NUM` | 0 | Test case results (need test cases) |
|
|
| `--entity-report-data NUM` | 0 | Entity report data insights |
|
|
| `--web-analytic-views NUM` | 0 | Web analytic entity view reports |
|
|
| `--web-analytic-activity NUM` | 0 | Web analytic user activity reports |
|
|
| `--raw-cost-analysis NUM` | 0 | Raw cost analysis reports |
|
|
| `--aggregated-cost-analysis NUM` | 0 | Aggregated cost analysis reports |
|
|
|
|
#### Other options
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--server URL` | `http://localhost:8585` | Target OpenMetadata server |
|
|
| `--workers NUM` | 20 | Concurrent HTTP workers |
|
|
| `--quick` | - | Quick mode preset (~10K entities) |
|
|
| `--scale PRESET` | - | Scale preset (small/medium/large/xlarge) |
|
|
| `--skip-reads` | - | Skip read benchmarking phase (Phase 8) |
|
|
| `--only-reads` | - | Skip write phases; discover existing entities and run reads only |
|
|
| `--mixed` | - | Run mixed read/write workload (Phase 9) |
|
|
| `--mixed-duration SECS` | 60 | Duration of mixed workload in seconds |
|
|
| `--read-ratio PCT` | 80 | Read percentage in mixed workload (0-100) |
|
|
| `--realistic` | - | Run Phase 4 entity creation concurrently across entity types using a shared worker pool |
|
|
|
|
### Entity Creation Order
|
|
|
|
The script creates entities in dependency order across up to 9 phases:
|
|
|
|
```
|
|
Phase 1 Metadata domains, classifications, tags, glossaries, terms, users, teams
|
|
Phase 2 Services database, dashboard, pipeline, messaging, ML, storage, search, API
|
|
Phase 3 Infrastructure databases, schemas, API collections
|
|
Phase 4 Core entities tables, dashboards, charts, topics, pipelines, storedProcedures,
|
|
containers, searchIndexes, mlmodels, queries, dataModels,
|
|
apiEndpoints, dataProducts
|
|
Phase 5 Data Quality testSuites, testCases
|
|
Phase 6 Lineage table->table (60%), table->dashboard (25%), pipeline->table (15%)
|
|
Phase 7 Time-Series testCaseResults, entityReportData, webAnalyticViews,
|
|
webAnalyticActivity, rawCostAnalysis, aggCostAnalysis
|
|
Phase 8 Read Benchmarks entity fetch, paginated list, search queries, lineage traversal
|
|
Phase 9 Mixed Workload concurrent reads + writes for configurable duration (--mixed)
|
|
```
|
|
|
|
Phases 8 and 9 are optional:
|
|
- Phase 8 runs automatically unless `--skip-reads` is passed
|
|
- Phase 9 only runs when `--mixed` is passed
|
|
- `--only-reads` skips phases 1-7, discovers existing entities, and runs Phase 8
|
|
|
|
### Entity Linking
|
|
|
|
- **Tables, dashboards, pipelines**: IDs collected during Phase 4 for use in lineage (Phase 6)
|
|
- **Test cases**: FQNs collected for testCaseResult creation (Phase 7)
|
|
- **Lineage edges**: Use collected UUIDs via `PUT /api/v1/lineage`
|
|
- Collections are capped at `max(lineage_edges * 2, test_case_results)` to bound memory
|
|
|
|
### Auto-Scaling Infrastructure
|
|
|
|
Databases and schemas scale automatically with table count:
|
|
- `NUM_DATABASES = max(1, tables / 50000)`
|
|
- `SCHEMAS_PER_DB = min(20, tables / (databases * 5000))`
|
|
- This keeps ~5000 tables per schema at any scale
|
|
|
|
### Retry Logic
|
|
|
|
HTTP requests retry up to 3 times with exponential backoff (1s, 2s, 4s) on:
|
|
- 5xx server errors
|
|
- Connection errors / timeouts
|
|
|
|
### Realistic Concurrent Workload (`--realistic`)
|
|
|
|
By default, `--workers N` creates N concurrent workers **per entity type**, but entity types run
|
|
sequentially (all tables first, then all dashboards, etc.). With `--realistic`, all Phase 4 entity
|
|
types are created concurrently through a **single shared worker pool**, simulating real-world traffic
|
|
where tables, dashboards, topics, and pipelines all hit the server at the same time.
|
|
|
|
This exposes contention patterns not visible in sequential mode:
|
|
- Cross-entity DB lock contention
|
|
- Shared thread pool pressure
|
|
- Connection pool exhaustion under mixed workloads
|
|
|
|
```bash
|
|
# Realistic mode: all entity types hit the server concurrently
|
|
./perf-test.sh --scale 10k --realistic --server http://localhost:8585
|
|
|
|
# Compare with sequential mode (default)
|
|
./perf-test.sh --scale 10k --server http://localhost:8585
|
|
```
|
|
|
|
The report includes a `realistic_combined` entry showing combined RPS and latency distribution
|
|
across all entity types, in addition to individual per-entity-type metrics.
|
|
|
|
### Performance Tips
|
|
|
|
- Use `--workers 30` or higher if the server can handle it
|
|
- Time-series and lineage phases use `min(10, workers)` to avoid overwhelming the server
|
|
- At `xlarge` scale, expect the script to run for several hours depending on server capacity
|
|
- Monitor server logs for 429/503 errors and reduce workers if needed
|
|
|
|
### Multi-Scale Benchmarking
|
|
|
|
Run benchmarks across multiple asset counts to compare performance at different scales:
|
|
|
|
```bash
|
|
# Benchmark at 10k, 50k, 100k, and 200k entities
|
|
for scale in 10k 50k 100k 200k; do
|
|
./scripts/perf-test.sh --scale "$scale" --server http://localhost:8585 \
|
|
--output "/tmp/bench-${scale}.json" --workers 20 2>&1 | tee "/tmp/bench-${scale}.log"
|
|
done
|
|
```
|
|
|
|
With read and mixed workload benchmarks included:
|
|
|
|
```bash
|
|
for scale in 10k 50k 100k; do
|
|
./scripts/perf-test.sh --scale "$scale" --server http://localhost:8585 \
|
|
--mixed --mixed-duration 30 \
|
|
--output "/tmp/bench-${scale}.json" 2>&1 | tee "/tmp/bench-${scale}.log"
|
|
done
|
|
```
|
|
|
|
Compare results across scales:
|
|
|
|
```bash
|
|
for f in /tmp/bench-*.json; do
|
|
echo "=== $(basename $f) ==="
|
|
python3 -c "
|
|
import json
|
|
r = json.load(open('$f'))
|
|
o = r['overall']
|
|
print(f\" Entities: {o['total_entities_created']:,} RPS: {o['overall_throughput_rps']:.1f}\"
|
|
f\" Errors: {o['overall_error_rate_pct']:.1f}% Time: {o['total_wall_clock_s']:.0f}s\")
|
|
"
|
|
done
|
|
```
|
|
|
|
## Verification After Loading
|
|
|
|
```bash
|
|
# 1. Trigger reindex
|
|
./scripts/trigger-reindex.sh
|
|
|
|
# 2. Check partition table for all entity types
|
|
mysql -e "SELECT DISTINCT entityType FROM search_index_partition ORDER BY entityType;"
|
|
|
|
# 3. Verify counts in UI
|
|
# - Data Assets: tables, topics, dashboards, pipelines, etc.
|
|
# - Data Quality: test suites and test cases
|
|
# - Lineage: visible edges between tables/dashboards/pipelines
|
|
# - Data Insights: time-series charts for entity reports, web analytics, cost analysis
|
|
```
|