mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
* fix: add concurrency control for OpenAI embedding HTTP requests (#26392) During ingestion, many virtual threads call OpenAIEmbeddingClient.embed() concurrently, overwhelming the HTTP/2 connection's stream limit and causing "too many concurrent streams" IOException. Add a Semaphore with a limit of 10 concurrent requests to throttle outbound HTTP calls to the OpenAI API. Closes #26392 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: move concurrency control from OpenAIEmbeddingClient to EmbeddingClient base class Convert EmbeddingClient from interface to abstract class with a Semaphore-based template method: embed() acquires the permit, delegates to doEmbed(), and releases in a finally block. All implementations (OpenAI, Bedrock, DJL) now get uniform concurrency bounds without managing it individually. - Remove per-client semaphore/executor from OpenAIEmbeddingClient and BedrockEmbeddingClient - Rename embed() -> doEmbed() in all implementations - Update MockEmbeddingClient in tests to extend the abstract class Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing authenticator() override to HttpClient stub in test The CI JDK requires authenticator() to be implemented when subclassing HttpClient directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing connectTimeout() override to HttpClient stub in test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: make maxConcurrentEmbeddingRequests configurable via NLS config Add maxConcurrentEmbeddingRequests to the NaturalLanguageSearchConfiguration JSON schema (default 10, minimum 1). The EmbeddingClient base class reads the value from config via a shared resolveMaxConcurrent() helper. All three clients (OpenAI, Bedrock, DJL) pass the config value to super() so the semaphore limit is tunable per deployment without code changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update generated TypeScript types * fix: add maxConcurrentEmbeddingRequests to openmetadata.yaml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Address review: use dedicated executor in concurrency test, validate maxConcurrentRequests, add test coverage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix package-private constructor to properly chain concurrency limit to super The 6-arg package-private constructor was implicitly calling super(), which hardcoded the semaphore to DEFAULT_MAX_CONCURRENT_REQUESTS regardless of configuration. Added a 7-arg constructor that accepts maxConcurrentRequests and calls super(maxConcurrentRequests), with the 6-arg version chaining to it using the default. Updated concurrency test to use a custom limit (3) to verify configurability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| openmetadata-env.sh | ||
| openmetadata-s3-logs.yaml | ||
| openmetadata.yaml | ||
| operations.yaml | ||
| private_key.der | ||
| public_key.der | ||