OpenMetadata/openspec/changes/incident-lifecycle-workflow/proposal.md
Sriharsha Chintalapani 51ecf4502f
Task redesign (#25894)
* Task Redesign: Add Task entity & tests

* Task Redesign: Add Task entity & tests

* Task Redesign: Add Permissions checks for Task APIs

* Task UI changed to the new APIs

* Migrate UI and APIs to new tasks system inlcuding suggestions

* Add Suggestions integration

* Activity Feed Refactor

* ActivityFeed -> ActivityStream publisher

* Activity Feed redesign

* Activity Feed redesign, adding tests

* Incident Manager update

* Migrate Incidents to new tasks

* Migrate Incidents to new tasks

* Update generated TypeScript types

* Update generated TypeScript types

* feat(tasks): add domain-aware task cutover and workflow v2 migration

* test(tasks): cover domain filters and task feed visibility flows

* Address comments

* Fix workflow tests to use new Task entity API and fix UserApprovalTaskV2 candidate transformation

Migrated 9 WorkflowDefinitionResourceIT tests from legacy Feed/Thread API to the new
Task entity API (UserApprovalTaskV2 creates Task entities, not Thread entities). Fixed
a bug in UserApprovalTaskV2 where candidates were passed as raw EntityReferences instead
of being transformed into users/teams FQN arrays for SetApprovalAssigneesImpl.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix tests

* refactor: stabilize task entity workflows

* refactor: finish task entity cutover and activity migration

* refactor: migrate legacy thread feed during cutover

* refactor: split legacy thread rename and archive migrations

* Merge main; fix tests

* Update generated TypeScript types

* feat: advance task redesign through phase 2

* Merge main; fix tests

* Update generated TypeScript types

* Fix failing tests

* Update generated TypeScript types

* fininsh phase 6 of the design, configurable task forms

* Update generated TypeScript types

* Update generated TypeScript types

* Fix linting

* Address gitar comments

* Address gitar comments

* Fix build

* Address giar comments

* fix build

* Add task custom forms

* Fix tests

* Address tests

* Apply UI lint autofixes

* Fix tess

* Fix linter

* Fix task patching

* Fix tests

* Fix playwright tests

* fix java checkstyle

* Add python sdk support for tasks, annoucements

* Fix playwright tests

* Fix playwright tests

* Fix playwright tests

* Fix python tests

* Fix python tests

* Fix linting workflows

* fix pycheck

* fix pycheck

* Fix tests

* Fix build

* Address deviations from main and fix tests

* Fix integration tests

* Fix integration tests

* Fix integration tests

* Update generated TypeScript types

* Fix Playwright tests

* Fix Playwright tests

* feat(incident): wire incident manager to task-first architecture (#27369)

* feat(incident): wire incident manager to task-first architecture

Connect the incident manager to the task redesign so it works
end-to-end: resolve data persistence, backward transitions,
reopen from resolved, and incident discovery via TCRS.

* Update generated TypeScript types

* refactor: single-query incident task lookup with parameterized statuses

Replace two sequential queries (Open, InProgress) in
getOrCreateIncident with one findByAboutAndTypeAndStatuses
query using @BindList for status IN (...).

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix Playwright tests

* Update generated TypeScript types

* Fix linter

* Fix tests

* Fix tests

* Fix checkstyle

* Fix tests

* Fix checkstyle

* Update FeedResourceIT.java

* Update TableRepository.java

* fix tests

* Update ActivityFeedProvider.tsx

* fix tests

* fix tests

* Address Task comments

* Fix unit test

* Fix the feed summary panel showing on landing page

* Fix comment functionality

* Fix pytests

* Fix failing playwright tests

* Fix test flakiness

* Fix ui-checkstyle

* Fix advanced search spec failure

* Fix playwright tests

Co-authored-by: Copilot <copilot@github.com>

* Fix checkstyle

* Fix the flaky tests

Co-authored-by: Copilot <copilot@github.com>

* fix checkstyle

* Reduce the workflow polling

* Update generated TypeScript types

* skip failing tests

Co-authored-by: Copilot <copilot@github.com>

* Fix ui-checkstyle

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: IceS2 <pablo.takara@getcollate.io>
Co-authored-by: karanh37 <karanh37@gmail.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
2026-04-23 15:52:30 +02:00

4.3 KiB

Incident Lifecycle Workflow

Slice 1 of 3 — Incident Manager → Governance Workflows Migration Depends on: Nothing (first slice) Enables: incident-auto-close, incident-ttl ADR: adr-incident-manager-governance-workflows.md


What Ships

When a test case's incident status changes, a governance workflow reacts to the event — creating the Thread/task on new incidents and closing it on resolution. The workflow is a single branching definition that users can see and extend in the governance workflows UI.

User-visible changes:

  • Incident Thread/task created immediately on test failure (no longer deferred to Ack)
  • Auto-assign to table owner configurable (default: unassigned, matching current behavior)
  • Incident lifecycle visible in governance workflows UI
  • Users can customize by adding steps to workflow branches (e.g., notifications, Jira)
  • Re-open from Resolved to any non-Resolved status creates a new incident lifecycle

Behavior preserved:

  • REST API surface unchanged
  • Ack and Assigned transitions unchanged in repository (assignee patching)
  • TCRS record creation unchanged (synchronous, for incidentId linking)
  • Severity inference unchanged (in repository)

What We Build

Generic Task Nodes: openTask and closeTask

Two new generic governance workflow nodes, reusable beyond incident management:

openTask (nodeType: automatedTask, nodeSubType: openTask):

  • Idempotently creates a Thread with configurable TaskType and TaskStatus.Open
  • If a Thread/task already exists for the entity, it's a no-op
  • Optional auto-assign via responsibles config (default: unassigned)
  • Configurable via template (e.g., "incident", future: "review")

closeTask (nodeType: automatedTask, nodeSubType: closeTask):

  • Closes an open Thread/task for the entity
  • If no open Thread/task exists, it's a no-op

Both follow the three-layer pattern: Task (BPMN) → Delegate (JavaDelegate) → Impl (pure logic).

TCRS Event Broadcasting

Extend EntityLifecycleEventDispatcher to broadcast TestCaseResolutionStatus events to registered handlers. TCRS is a time-series entity that does NOT emit ChangeEvents, so this is a new event pipeline.

Flow: storeInternal()EntityLifecycleEventDispatcherWorkflowHandler → Flowable signal

Signal-Driven Workflow Triggering

Every TCRS event broadcasts a Flowable signal "tcrs_{fqn}" with the TCRS status as a process variable. The signal starts a new short-lived process instance in every matching workflow. No Flowable queries needed for routing.

Default Incident Lifecycle Workflow

Single branching workflow, ships enabled:

Trigger: Signal "tcrs_{fqn}" (from TCRS event broadcast)

[Signal Start] → [Gateway: status?]
    ├─ NOT Resolved → [OpenTask] → [End]
    ├─ Resolved     → [CloseTask] → [End]
    └─ Otherwise    → [End]

All process instances are short-lived in Slice 1 (no timers). OpenTask and CloseTask are idempotent — safe to fire on every event. The gateway routes purely on TCRS status; nodes handle their own edge cases.


Out of Scope

Feature Deferred to Why
Auto-close on test pass Slice 2 Independent feature, separate workflow
TTL / stale incident expiration Slice 3 Timer subprocess with signal boundary
Timer subprocess + signal interruption Slice 3 Architecture supports it, no timers yet
tcrs_closed_{fqn} termination signal Slice 3 Only needed for timer subprocess interruption
Cleanup timer for orphaned processes Follow-up Batch sweep; idempotent openTask handles most
Custom lifecycle states Future Template-driven state machine evolution

Open Questions

  • EntityLifecycleEventDispatcher extension: What's the cleanest way to add TCRS event broadcasting? Observer registration, interface, or direct handler call?
  • Thread/task creation: Does openTask create the Thread directly via FeedRepository.create(), or reuse TestCaseResolutionStatusRepository.createTask()?
  • Signal payload: Can signal variables carry enough context (status, FQN, stateId) to avoid a DB read in the gateway, or should each node read from DB independently?