OpenMetadata/conf
Pere Miquel Brull 7e0ee80c28
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
feat(search): add Google Gemini embedding provider (#27974)
* Add design: Google Gemini embedding client

Adds a fourth embedding provider (google) alongside openai/bedrock/djl,
using the Generative Language API with a single API key.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Add implementation plan: Google Gemini embedding client

7 tasks covering schema change + regen, client implementation,
validation tests, error path tests, request shape tests, switch
wiring, and final verification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(spec): add google embedding provider config block

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(search): add GoogleEmbeddingClient with happy-path test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(search): extract MODELS_PREFIX constant in GoogleEmbeddingClient

The string "models/" appeared in both DEFAULT_BASE_URL and the buildRequestBody
method. Extract it as a named constant per project standards.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(search): add constructor validation tests for GoogleEmbeddingClient

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(search): add blank model id test and clarify null-modelId workaround

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(search): add HTTP error and malformed response tests for GoogleEmbeddingClient

* test(search): tighten empty values array assertion to check message

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(search): verify Google embedding request URL, headers, and body shape

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(search): extract endpoint constant and harden extractBody helper

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(search): wire google embedding provider into SearchRepository switch

* test(search): cover null dimension and custom endpoint, drop redundant comment

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update generated TypeScript types

* Remove internal planning docs from PR

These were workflow scaffolding (design spec + implementation plan)
generated by the superpowers brainstorming/planning flow; they belong
in the local development trail, not the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address PR review comments

- GoogleEmbeddingClient.buildRequest: handle endpoint with existing query
  string by switching the key separator from '?' to '&' as needed; document
  why the API key travels in the URL (Google Generative Language API
  requirement, not Bearer-header).
- GoogleEmbeddingClient.extractErrorMessage: replace empty catch block with
  a trace-level log to comply with the 'no empty catch' standard.
- elasticSearchConfiguration.json: clarify google.endpoint description so
  operators know it must be the full ':embedContent' URL, not a base URL.
- GoogleEmbeddingClientTest.extractBody: await onComplete via
  CompletableFuture.get(5s) instead of relying on synchronous publisher
  delivery; surface onError properly.
- New test: testEndpointWithExistingQueryStringUsesAmpersand verifies the
  '?' / '&' separator logic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update generated TypeScript types

* Wire google embedding provider into openmetadata.yaml defaults

- Add `google:` block under naturalLanguageSearch with env-var fallbacks
  (GOOGLE_API_KEY, GOOGLE_EMBEDDING_MODEL_ID, GOOGLE_EMBEDDING_DIMENSION,
  GOOGLE_API_ENDPOINT).
- Update embeddingProvider option list comment to include "google".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Use gemini-embedding-001 default and pass outputDimensionality

The previous default (text-embedding-004) is rejected on some Google
projects with `404: not found for API version v1beta, or is not
supported for embedContent`. Switch to gemini-embedding-001 — the
current GA model, available at v1beta and broadly accessible.

- GoogleEmbeddingClient.buildRequestBody: include outputDimensionality
  from the configured embeddingDimension. Required for gemini-embedding-001
  (defaults to 3072 dims otherwise) and supported as a truncation hint
  by text-embedding-004.
- elasticSearchConfiguration.json + openmetadata.yaml: change default
  embeddingModelId to gemini-embedding-001 and document the
  outputDimensionality semantics on the embeddingDimension field.
- GoogleEmbeddingClientTest.testRequestBodyShape: assert
  outputDimensionality=768 in the captured body and use
  gemini-embedding-001 as the test fixture model.
- SystemRepository.getEmbeddingConfigurationMessage: add a `google` case
  so /api/v1/system/status surfaces the configured model/endpoint
  instead of "Unknown provider 'google'".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update generated TypeScript types

* Guard against missing google config in SystemRepository diagnostic

If `embeddingProvider=google` but the `google` config block is absent,
calling `nlpConfig.getGoogle().getEndpoint()` would NPE and produce
a misleading "Unable to determine embedding configuration" message.
Add an explicit null check that yields a clear diagnostic instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Validate google.endpoint contains :embedContent at construction

A custom endpoint missing the `:embedContent` action used to silently
produce 404s at runtime. Fail fast at startup with a clear message
showing the expected URL form, so misconfiguration surfaces in logs
instead of in vector-search failures.

- Update testCustomEndpointConstruction to use a valid full URL.
- Add testCustomEndpointWithoutEmbedContentThrows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(spec): add modelId chat field to google block

Adds a `modelId` property to the natural-language-search `google` block,
parallel to how the `openai` block exposes both `modelId` (chat) and
`embeddingModelId` (embedding). This enables Gemini-based NLQ filter
extraction (chat completions via :generateContent) on top of the existing
embedding support.

Default: gemini-2.5-flash.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update generated TypeScript types

* Update generated TypeScript types

* trigger

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-05-10 16:37:53 +02:00
..
openmetadata-env.sh License header update (#1498) 2021-12-01 12:46:28 +05:30
openmetadata-s3-logs.yaml Add logging endpoint into S3 (#22533) 2025-09-15 07:22:25 -07:00
openmetadata.yaml feat(search): add Google Gemini embedding provider (#27974) 2026-05-10 16:37:53 +02:00
operations.yaml Update operations.yaml (#22231) 2025-07-08 16:06:55 -07:00
private_key.der feat(ui): login via email and password - Basic auth (#7558) 2022-09-23 16:05:54 +05:30
public_key.der feat(ui): login via email and password - Basic auth (#7558) 2022-09-23 16:05:54 +05:30