OpenMetadata/openspec/changes/incident-ttl/proposal.md
Sriharsha Chintalapani 51ecf4502f
Task redesign (#25894)
* Task Redesign: Add Task entity & tests

* Task Redesign: Add Task entity & tests

* Task Redesign: Add Permissions checks for Task APIs

* Task UI changed to the new APIs

* Migrate UI and APIs to new tasks system inlcuding suggestions

* Add Suggestions integration

* Activity Feed Refactor

* ActivityFeed -> ActivityStream publisher

* Activity Feed redesign

* Activity Feed redesign, adding tests

* Incident Manager update

* Migrate Incidents to new tasks

* Migrate Incidents to new tasks

* Update generated TypeScript types

* Update generated TypeScript types

* feat(tasks): add domain-aware task cutover and workflow v2 migration

* test(tasks): cover domain filters and task feed visibility flows

* Address comments

* Fix workflow tests to use new Task entity API and fix UserApprovalTaskV2 candidate transformation

Migrated 9 WorkflowDefinitionResourceIT tests from legacy Feed/Thread API to the new
Task entity API (UserApprovalTaskV2 creates Task entities, not Thread entities). Fixed
a bug in UserApprovalTaskV2 where candidates were passed as raw EntityReferences instead
of being transformed into users/teams FQN arrays for SetApprovalAssigneesImpl.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix tests

* refactor: stabilize task entity workflows

* refactor: finish task entity cutover and activity migration

* refactor: migrate legacy thread feed during cutover

* refactor: split legacy thread rename and archive migrations

* Merge main; fix tests

* Update generated TypeScript types

* feat: advance task redesign through phase 2

* Merge main; fix tests

* Update generated TypeScript types

* Fix failing tests

* Update generated TypeScript types

* fininsh phase 6 of the design, configurable task forms

* Update generated TypeScript types

* Update generated TypeScript types

* Fix linting

* Address gitar comments

* Address gitar comments

* Fix build

* Address giar comments

* fix build

* Add task custom forms

* Fix tests

* Address tests

* Apply UI lint autofixes

* Fix tess

* Fix linter

* Fix task patching

* Fix tests

* Fix playwright tests

* fix java checkstyle

* Add python sdk support for tasks, annoucements

* Fix playwright tests

* Fix playwright tests

* Fix playwright tests

* Fix python tests

* Fix python tests

* Fix linting workflows

* fix pycheck

* fix pycheck

* Fix tests

* Fix build

* Address deviations from main and fix tests

* Fix integration tests

* Fix integration tests

* Fix integration tests

* Update generated TypeScript types

* Fix Playwright tests

* Fix Playwright tests

* feat(incident): wire incident manager to task-first architecture (#27369)

* feat(incident): wire incident manager to task-first architecture

Connect the incident manager to the task redesign so it works
end-to-end: resolve data persistence, backward transitions,
reopen from resolved, and incident discovery via TCRS.

* Update generated TypeScript types

* refactor: single-query incident task lookup with parameterized statuses

Replace two sequential queries (Open, InProgress) in
getOrCreateIncident with one findByAboutAndTypeAndStatuses
query using @BindList for status IN (...).

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix Playwright tests

* Update generated TypeScript types

* Fix linter

* Fix tests

* Fix tests

* Fix checkstyle

* Fix tests

* Fix checkstyle

* Update FeedResourceIT.java

* Update TableRepository.java

* fix tests

* Update ActivityFeedProvider.tsx

* fix tests

* fix tests

* Address Task comments

* Fix unit test

* Fix the feed summary panel showing on landing page

* Fix comment functionality

* Fix pytests

* Fix failing playwright tests

* Fix test flakiness

* Fix ui-checkstyle

* Fix advanced search spec failure

* Fix playwright tests

Co-authored-by: Copilot <copilot@github.com>

* Fix checkstyle

* Fix the flaky tests

Co-authored-by: Copilot <copilot@github.com>

* fix checkstyle

* Reduce the workflow polling

* Update generated TypeScript types

* skip failing tests

Co-authored-by: Copilot <copilot@github.com>

* Fix ui-checkstyle

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: IceS2 <pablo.takara@getcollate.io>
Co-authored-by: karanh37 <karanh37@gmail.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
2026-04-23 15:52:30 +02:00

2.9 KiB

Incident TTL (Auto-Close Stale Incidents)

Slice 3 of 3 — Incident Manager → Governance Workflows Migration Depends on: incident-lifecycle-workflow (Slice 1) Enables: Nothing (terminal slice) ADR: adr-incident-manager-governance-workflows.md


What Ships

Incidents open longer than a configurable deadline (e.g., 30 days) are automatically resolved with reason Expired. Configurable per workflow, disabled by omitting the ttl field.

User-visible changes:

  • Stale incidents auto-close after deadline
  • Resolution reason: Expired (distinct from AutoResolved and manual)
  • TTL configurable per workflow (ISO 8601 duration: P30D, P7D, etc.)
  • Default workflow ships with ttl: "P30D"

What We Build

TTL Boundary Timer on HumanInterventionTask

Add an interrupting boundary timer to the HIT SubProcess (built in Slice 1):

Existing HIT (from Slice 1):
  [StartEvent] → [SetupPhase] → [Gateway] → [IntermediateCatchEvent: wait] → [End]

New addition (conditional on ttl config):
  + [BoundaryTimer: TTL deadline, interrupting]
      → [ServiceTask: AutoResolveExpiredImpl]
          - Create Resolved status (reason: "Expired") via repository
          - Close Thread task via repository
      → [EndEvent]

Only compiled into BPMN when ttl is set in the HIT config. No TTL = no timer = no overhead.

AutoResolveExpiredImpl:

  1. Get test case FQN from process business key
  2. Create Resolved record (reason: Expired) via repository
  3. Close Thread task
  4. Process ends (interrupting timer terminates the subprocess)

Schema Changes

  • resolved.json: Add Expired to TestCaseFailureReasonType enum
  • humanInterventionTask.json: Document ttl field (ISO 8601 duration)

Updated Default Workflow

Update incident-lifecycle workflow to include TTL:

{ "config": { "template": "incident", "responsibles": { "source": "tableOwner" }, "ttl": "P30D" } }

Out of Scope

Feature Deferred to Why
SLA escalation timers Future Same boundary timer infrastructure, different business logic
Per-severity TTL Future Requires conditional timer duration
TTL warning notification Future Non-interrupting timer before deadline

Design Notes

Interrupting, not non-interrupting. When TTL fires, the incident is expired — nothing left to wait for. The subprocess terminates.

Boundary timer, not polling. Flowable fires the timer exactly once at the deadline, per process instance. No table scans, no cron. At 75K incidents with 30-day TTL, overhead is negligible.

Expired vs AutoResolved. Different operational signals: "issue was fixed" (auto-close) vs "nobody looked at this" (TTL). Enables distinct reporting and alerting.