Commit graph

4 commits

Author SHA1 Message Date
Johnny Greco
03b3d6c726
chore: address Andre's feedback on --save-results and CLI preview (#335)
* fix: suppress stdout when saving report and sample records to file

Console(record=True) still prints to stdout by default. Use
file=io.StringIO() to redirect output so save-path calls only
write to disk.

* refactor: --save-results skips terminal display

When --save-results is used, records and the analysis report are no
longer printed to the terminal. Extracted save logic into a dedicated
_save_preview_results method and updated option help text accordingly.

* feat: wrap-around navigation in sample records browser

Prev/next buttons and arrow keys now cycle back to the beginning/end
instead of clamping at boundaries.

* test: reuse record_series fixture in visualization tests

* feat: thread --theme through to sample records pager

The pager shell was hardcoded dark, so --theme light produced
light records inside a dark frame. Extract CSS variables into
dark/light constants and pass the theme from the controller.

* fix: cap terminal display width at display_width

The module-level Console() had no width limit, so tables with
expand=True stretched to the full terminal width. Cap terminal
output at min(terminal_width, display_width) and thread the
display_width parameter through the controller's display methods.

* docs: update --display-width and --theme help text

Remove "Only applies when --save-results is used" from
--display-width since it now also affects terminal output.

* fix: update generation controller tests to match display_width and save_results behavior
2026-02-18 20:17:03 -05:00
Johnny Greco
1439bbea7e
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time

Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.

Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations

Reduces CLI import-time from ~1.67s to ~0.46s.

* perf: defer pandas/numpy in io_helpers and add config_list benchmark

- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
  with module-level __getattr__ (for backwards-compatible external
  access / test mocks) and function-level imports in the 3 functions
  that actually use them (read_parquet_dataset, smart_load_dataframe,
  _convert_to_serializable). Importing io_helpers no longer triggers
  pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
  bodies to avoid loading repositories, Rich, and prompt_toolkit at
  module import time.
- Add `config_list` (data-designer config list) measurement to the
  CLI startup benchmark with isolated cold measurement in a separate
  venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.

* Refine lazy import usage and TYPE_CHECKING cleanup

* Run license header updater on PR-touched files

* fix: update sqlfluff mock target for lazy imports in test_sql

* perf: cache globals() in lazy __getattr__ to avoid repeated lookups

Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.

* perf: lazy CLI command loading and deferred heavy import evaluations

- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files

- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes

- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks

- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use

- Update test mock targets to patch at usage-site for module-level imports

* refactor: use direct pandas import in seed_source_dataframe

Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.

* update lazy import pattern

* update tests to use lazy import namespace

Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.

* tighten import perf test thresholds

Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.

* document pandas import requirement

Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.

* increase timeout time

* use lazy pandas imports in visualization tests

- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted

* fix lazy pandas runtime usage and preview mocks

Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 16:24:15 -05:00
Johnny Greco
f2a1657870
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command

* feat: add save_path option to display_sample_record

Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.

* feat: replace --save-report with --save-results on preview command

Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.

* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish

* feat: add dataset metadata subtitle to pager and clean up toolbar layout

* fix: address review findings for preview save-results feature

- Split try/except in generation_controller so report display errors
  don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup

* fix: address code review findings and simplify pager

- Fix critical bug: analysis report now displays to console even when
  --save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
  out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
  bridge, and responsive media queries; restore GitHub link; reorder
  toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions

* style: move record counter to far right of pager toolbar

* refactor: remove dead theme-listener script and inline CSS constant

_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.

* fix: move Path import out of TYPE_CHECKING block in test_visualization

* fix: rename _logger to logger to match codebase convention

* fix: remove unnecessary cast in preview command theme parameter

* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public

* Update packages/data-designer-config/tests/config/utils/test_visualization.py

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 15:58:35 -05:00
Johnny Greco
d3c4de76da
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands

Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files

Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands

* fix: update pythonjsonlogger import and clean up dev dependencies

- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py

* small E

* address greptile feedback

* organize CLI commands into rich help panels

Group top-level commands under "Generation" and "Setup" panels
for clearer help output.

* refactor config loader to parse files directly and auto-detect config format

- Parse YAML/JSON files into dicts before passing to from_config,
  providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
  wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
  sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping

* fix sys.path cleanup in config loader and simplify tests

- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
  and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests

* move config format auto-detection into from_config

Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.

* extract GenerationController from CLI commands

Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.

* harden sys.path cleanup and add explanatory comments

Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.

* check stdout TTY in preview interactive mode detection

Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 14:06:06 -05:00