Commit graph

2 commits

Author SHA1 Message Date
Mohit Yadav
7693a5b04b
Update indexing schedule (#27204)
* Update schedule to weekly

* Migration
2026-04-10 19:15:08 +05:30
Ram Narayan Balaji
b9d8c08b5b
Refactor(certification): store asset certification in tag_usage table (#26448)
* refactor(certification): store asset certification in tag_usage table

Previously, asset certification was stored as a JSON blob directly on the
entity row. This created a split system where the tag FQN lived in the
entity JSON while tag metadata (name, description, style) had to be
re-fetched from the tag table on every read.

It also meant certification was invisible to the tag_usage propagation
pipeline, so renaming a certification tag's FQN left stale data on
certified entities.

Certification is now stored in tag_usage alongside all other tags, using
the metadata column to carry expiryDate (added to TagLabelMetadata schema).
The entity's certification field remains the input/output surface, but
tag_usage is now the source of truth.

Key changes:

Storage & retrieval
- applyCertification() writes the certification tag into tag_usage on store
- deleteCertificationTag() removes it from tag_usage on clear/replace
- getCertification() reads from tag_usage filtered by the configured
  certification classification instead of parsing entity JSON
- getTags() now strips certification-classification tags so they are
  surfaced exclusively through getCertification()

Performance improvements
- batchFetchCertification() rewritten to a single batch query on tag_usage
  by FQN hash instead of performing N individual tag lookups

Tag update handling
- handleTagEntityUpdate() reads the allowed classification from settings
  (no longer hardcoded)
- correctly computes oldFQN on name change so Elasticsearch documents
  are found and updated using the correct key

DAO & schema changes
- deleteTagsByPrefixAndTarget() added to CollectionDAO for targeted
  certification tag removal
- TagLabel mappers hardened against unknown metadata fields

Migrations
- v1123 migrations backfill existing entity JSON certifications
  into tag_usage so no data is lost during upgrade

Tests
- TagResourceIT updated to assert getCertification() instead of getTags(),
  since certification tags are intentionally excluded from the tags list

* Update generated TypeScript types

* chore: apply changes

Co-authored-by: yan-3005 <yan-3005@users.noreply.github.com>

* fix(certification): prevent updateTags() from clobbering cert tags written by updateCertification()

* fix(certification): compute tagFQNHash per-segment in Java during migration and make applyCertification idempotent

* Update generated TypeScript types

* Fix: SQL-filtered cert batch fetch, remove double-delete, schema strict mode, ordinal bounds check, migration logging

* Update generated TypeScript types

* Fix Migration

* Fix Migration

* fix(certification): address Copilot review feedback on PR #26448

- Use exact field name comparison (FIELD_NAME.equals) instead of contains()
  in SearchRepository to avoid incorrect FQN-rename branch triggers when
  displayName changes

- Log previously swallowed exception in
  getCertificationClassificationFromSettings() to improve observability of
  certification search propagation failures

- Fix v1124 migration by building selectedIds inside the insert loop and
  skipping rows with null tagFQN, preventing UPDATE from removing
  certifications without corresponding tag_usage entries (avoids silent data loss)

- Update integration test to rename tag name (not displayName) so it correctly
  validates the FQN-change regression from #26432 and asserts propagation to
  entity certification field and search index

* fix(migration): fix v1124 certification migration correctness issues

- Fix wrong version string in error messages: both mysql and postgres
  Migration.java logged "v1123" instead of "v1124"
- Fix potential infinite loop: null-tagFQN rows were excluded from the
  INSERT but still counted in the return value (rows.size()), so when a
  full batch of 500 rows all had null tagFQN the loop never terminated.
  Fix by filtering null tagFQN at SQL level (WHERE tagFQN IS NOT NULL)
  and returning selectedIds.size() so the loop count reflects rows that
  were actually migrated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(certification): fix missing tables in migration and optimize getCertification query

- Add 6 missing entity tables to v1124 certification migration:
  file_entity, directory_entity, spreadsheet_entity, worksheet_entity,
  llm_model_entity, ai_application_entity — all define the certification
  field in their JSON schema; omitting them caused silent data loss on
  upgrade (certification stripped from JSON but never written to tag_usage)
- Replace getCertification() full-tag-fetch with getCertTagsInternalBatch()
  so single-entity reads issue a targeted WHERE tagFQN LIKE query instead
  of fetching all tags and filtering in Java (consistent with the bulk path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(certification): preserve appliedDate in migration and avoid appliedAt reset on unchanged cert

- v1124 migration now extracts certification.appliedDate from entity JSON
  and inserts it as tag_usage.appliedAt, preserving the original certification
  timestamp instead of defaulting to migration time
- applyCertification() now checks whether the existing certification tag
  matches the incoming one before doing delete+reinsert; if unchanged it
  returns early, preventing appliedAt from being reset on every entity write

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(certification): also compare expiryDate in applyCertification idempotency check

The previous fix skipped delete+reinsert when tagFQN was unchanged, but
this incorrectly swallowed expiryDate updates — re-certifying with the
same tag but a new validity period would return early and never write the
new expiryDate to tag_usage. Adding Objects.equals(expiryDate) to the
guard ensures metadata-only changes are still persisted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(certification): replace fixed sleeps with Awaitility polling in rename test

Fixed sleeps are flaky under CI load and always waste time when indexing
is faster. Replace both TimeUnit.SECONDS.sleep(2) calls and all
subsequent search/entity assertions with Awaitility.await().untilAsserted()
blocks (30s timeout, 1s poll interval) so the test waits exactly as long
as needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(migration): include exception in certification migration warning log

Pass the exception object to LOG.warn so the stack trace is available
for diagnosing production migration failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf: cache getCertificationClassification() via SettingsCache

Replace direct SystemRepository DB call with SettingsCache.getSettingOrDefault()
(Guava LoadingCache, 3-min TTL) to eliminate repeated DB hits on every
certification-related call in EntityRepository.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* skip the test

* Added new column for certification and tier

* nit

* Add test for tier and certification

* fix unit test

* Fix Unit tests

* Move Migrations to 1.12.5 and unit tests

* Fix NPE, batch certification writes, and improve test coverage

- Guard against null tagLabel in applyCertification to prevent NPE on
  malformed input
- Replace per-entity applyCertification loop in storeRelationshipsInternal
  with applyCertificationBatch, reducing 3N DB calls to 2 (one batch
  DELETE + one batch INSERT via existing applyTagsBatchMultiTarget)
- Add deleteTagsByPrefixAndTargets to TagUsageDAO as the batch variant
  of deleteTagsByPrefixAndTarget
- Add tests for applyCertificationBatch paths, getTags cert filtering,
  and TagLabelWithFQNHash.toTagLabel to meet 90% new-code coverage threshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add coverage tests for RowMappers, batchFetchCertification, and toTagLabel fallbacks

- Add TagLabelMapper and TagLabelWithFQNHashMapper tests using mock ResultSet
  to cover the new metadata-parsing code paths in CollectionDAO
- Add toTagLabel fallback tests for out-of-bounds enum ordinals covering
  the defensive conversion logic in TagLabelWithFQNHash
- Add storeRelationshipsInternal single-entity overload test covering line 2322
- Add fetchAndSetFields tests to cover batchFetchCertification happy path
  and exception fallback path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* resolved the linting issue

* nit

* fix lint issue

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gitar <noreply@gitar.ai>
Co-authored-by: yan-3005 <yan-3005@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Anujkumar Yadav <anujf0510@gmail.com>
2026-03-28 07:28:03 +00:00