DataDesigner/docs
Nabin Mulepati a9af365e8e
feat: add skip.when conditional column generation (#502)
* plan: add skip_when for conditional column generation (#479)

Adds implementation plan for a `skip_when` field on `SingleColumnConfig`
that enables conditional column generation. When the Jinja2 expression
evaluates truthy, the cell is set to None and the generator is skipped.
Skips auto-propagate through the DAG to downstream columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: remove HopChain example from skip_when plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: replace HopChain example with generic product review example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: add open questions on skip sentinel value and row filtering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: major revision — SkipConfig model, sync engine support, decouple propagation

- Introduce SkipConfig(when, value) as nested model on SingleColumnConfig
- Move propagate_skip to SingleColumnConfig as independent field, fixing
  bug where columns with no SkipConfig couldn't participate in propagation
- Add full sync engine implementation (Steps 4a-4d) covering both
  _fan_out_with_threads and _run_full_column_generator dispatch paths
- Add serialization boundary stripping for both DatasetBatchManager (sync)
  and RowGroupBufferManager (async)
- Simplify architecture diagrams for readability
- Update all references, design decisions, verification plan

Made-with: Cursor

* updates

* plan: document get_required_columns for skip propagation

- Explain why propagation must not use get_upstream_columns() once
  skip.when adds DAG edges; add _required_columns and
  get_required_columns() to the execution graph plan
- Point async _run_cell at get_required_columns for parity with sync
- Clarify DropSkippedRowsProcessorConfig vs stripping __skipped__ for
  DataFrames; tighten resolved-questions wording
- Extend DAG/graph verification with gating_col regression case

Refs #479

Made-with: Cursor

* plan: centralize __skipped__ handling in skip_provenance

- Document new skip_provenance.py (key constant, read/write/strip API)
- Point sync builder, async scheduler, and batch buffers at shared helpers
- Strip metadata before every DataFrame from buffer dicts, including
  FULL_COLUMN active subsets
- Split §3 into skip_evaluator vs skip_provenance; extend verification

Refs #479

Made-with: Cursor

* plan: align doc title with SkipConfig / skip.when

Drop legacy skip_when naming in headings and #362 cross-reference.

Refs #479

Made-with: Cursor

* plan: address review — delimiter validation, centralized error handling, caller-owns-deserialization

- SkipConfig._validate_when_syntax now checks find_undeclared_variables
  is non-empty, rejecting expressions without {{ }} delimiters that
  would silently skip every row
- evaluate_skip_when centralizes try/except so both sync and async
  engines get identical fail-safe behavior on eval errors
- evaluate_skip_when takes a single pre-deserialized record; caller
  runs deserialize_json_values once and passes to both skip eval and
  generator (no double deserialization, no redundant parameter)
- Update _should_skip_cell, async _run_cell, Files Modified table,
  and verification section accordingly

Refs #479

Made-with: Cursor

* plan: add get_side_effect_columns accessor to execution graph spec

Document _side_effects_by_producer inverse map and
get_side_effect_columns() accessor on ExecutionGraph, needed by
_write_skip_to_record / apply_skip_to_record to clear __trace,
__reasoning_content, etc. on skip. Added to both Step 2b metadata
section and Files Modified table.

The __skipped__ leak into active_df (greptile's other P1) was already
fixed in 70463789 via strip_skip_metadata_from_records.

Refs #479

Made-with: Cursor

* add skip.when conditional column generation

Introduce SkipConfig on SingleColumnConfig to gate column generation
with a Jinja2 expression. Columns can be skipped by expression or by
upstream propagation (propagate_skip flag).

- SkipConfig: Pydantic model with config-time syntax/delimiter/variable
  validation and cached column extraction from the Jinja2 AST
- skip_evaluator: runtime expression evaluation via NativeSandboxedEnvironment
  with fail-safe error handling (skip on expected failures)
- skip_provenance: centralized __skipped__ record tracking shared by
  sync builder, async scheduler, and buffer managers
- DAG/ExecutionGraph: skip.columns wired as dependency edges in both
  topological sort and static execution graph
- Validation: validate_skip_references checks reference existence,
  sampler/seed scope, and allow_resize conflicts
- Sync builder: cell-by-cell and full-column skip with merge-back
- Async scheduler: cell and batch skip with live-buffer provenance

Made-with: Cursor

* fix review findings for skip.when implementation

- Add skip evaluation to _fan_out_with_async (was missing, causing
  skipped rows to still be sent to the LLM)
- Preserve __skipped__ provenance on non-skipped records after
  full-column generation so multi-hop propagation works
- Use single live-buffer reference in _run_batch skip loop for
  consistency with _run_cell
- Move Template import to TYPE_CHECKING and reorder import blocks
- Replace O(n²) sum() with itertools.chain in dag.py
- Add set_required_columns/set_propagate_skip/set_skip_config
  setters to ExecutionGraph for symmetry with existing API

Made-with: Cursor

* add conditional generation with skip recipe and refactor skip helpers

Add a new recipe demonstrating skip.when patterns (expression gate,
propagation, opt-out) with a customer support ticket pipeline.

Also extract _should_skip_record in async_scheduler, remove the
redundant propagate_skip param from should_skip_by_propagation, and
pass a precomputed all_side_effects set through the DAG sort.

Made-with: Cursor

* updates

* fixes

* remove recipe > inject conditional gen into existing tutorial

* regen colab notebooks

* fix: handle missing execution graph in _column_can_skip

Return False when the graph has not been initialized instead of raising,
since skip logic cannot apply before generators are set up.

Made-with: Cursor

* parametrize some tests

* public before private

* slight refactor for readability

* parametrize some tests

* minor fixes

* reanme internla skip tracker key name

* clarify intent in comment

* when skipped _run_cell should return skipped value even though the consumer doesn't currenlty care about it

* remove inline import

* minor refactor for clarity

* fix: preserve skip metadata across replace_buffer and exclude allow_resize from skip branch

Two bugs in the sequential engine's _run_full_column_generator:

1. replace_buffer(df.to_dict()) erased __internal_skipped_columns in
   three code paths (MultiColumnConfig, non-skip-aware, has_skipped=False
   fallthrough), breaking propagate_skip for downstream columns when an
   independent FULL_COLUMN generator ran between skip-setting and
   propagating columns.

2. _column_can_skip returned True for allow_resize=True columns via
   propagation, causing the skip-aware merge path to raise on the 1:1
   row-count check for 1:N generators.

- Add restore_skip_metadata helper to skip_tracker.py
- Guard _column_can_skip against allow_resize=True columns
- Refactor _run_full_column_generator into three focused methods
- Remove dead allow_resize / _log_resize_if_changed from skip path
- Remove redundant _require_graph() calls in skip helpers
- Add single_column_config_by_name cached property
- Add integration tests for both bugs and unit tests for the helper

Made-with: Cursor

* address review comments on skip.when PR (#502)

- Extract shared skip decision logic (_should_skip_cell / _should_skip_record)
  into should_skip_column_for_record() in skip_evaluator.py so both sync and
  async engines call the same function (andreatgretel review comment)
- Extend SkipConfig self-reference validation to cover side-effect columns
  (e.g. review__trace on the review column) — previously only checked
  self.name, now checks self.name | self.side_effect_columns
- Add async engine integration tests for skip paths: cell-by-cell with
  propagation and full-column batch skip (exercises _run_cell / _run_batch)
- Fix test_allow_resize_column_not_blocked_by_upstream_skip to use default
  propagate_skip=True so it actually exercises the allow_resize guard
- Move get_skipped_column_names from skip_tracker to skip_evaluator (sole
  production consumer)

Made-with: Cursor

* address cr feedback

* Fix issue with full column  generating messing up order of skipped rows

* add skip conditional generation edge case tests

- test_skip_evaluator: parametrized should_skip_column_for_record covering
  propagation, expression gates, short-circuiting, and disabled propagation
- test_execution_graph: skip metadata accessors (get_skip_config,
  should_propagate_skip, get_required_columns, get_side_effect_columns,
  resolve_side_effect, skip.when DAG edges)
- test_dataset_builder: chained transitive propagation (4 levels),
  two independent skip gates, custom skip.value, row count preservation

Made-with: Cursor

* fix: make expression jinja validator private

Rename assert_expression_valid_jinja to _assert_expression_valid_jinja
to match the private naming convention used by other model validators.

Made-with: Cursor

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:31:50 -06:00
..
assets docs: add text-to-sql dev note (#349) 2026-04-14 11:10:14 -07:00
code_reference feat: add Streamable HTTP transport support for remote MCP providers (#358) 2026-03-04 08:11:54 -07:00
colab_notebooks feat: add skip.when conditional column generation (#502) 2026-04-15 09:31:50 -06:00
concepts feat: add Pi Coding Agent rollout seed source (#513) (#514) 2026-04-09 12:07:47 -04:00
css add footer navigation (#108) 2025-12-09 13:42:06 -05:00
devnotes fix: text-to-sql devnote date, images, and publish-devnotes nav (#546) 2026-04-14 15:48:23 -03:00
images docs: Updated telemetry (#451) 2026-03-23 08:30:37 -07:00
js docs: add initial plugin documentation (#107) 2025-12-11 16:05:11 -05:00
notebook_source feat: add skip.when conditional column generation (#502) 2026-04-15 09:31:50 -06:00
overrides fix: typo on path to colab notebook (#129) 2025-12-12 15:37:36 -03:00
plugins docs: consolidated seed reader documentation for filesystem and agent rollout sources (#481) 2026-03-31 13:31:42 -04:00
recipes docs: add agent rollout ingestion docs entry point (#499) 2026-04-08 10:13:12 -04:00
scripts docs: add Open in Colab badges to tutorial notebooks (#391) 2026-03-13 17:47:10 -04:00
CONTRIBUTING.md docs: welcome and concepts/columns (#43) 2025-11-17 17:07:01 -05:00
index.md Updated url (#325) 2026-02-11 14:43:38 -08:00