feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
|
|
|
|
# SPDX-License-Identifier: Apache-2.0
|
|
|
|
|
|
|
|
|
|
from __future__ import annotations
|
|
|
|
|
|
|
|
|
|
from pathlib import Path
|
|
|
|
|
from unittest.mock import MagicMock, call, patch
|
|
|
|
|
|
|
|
|
|
import pytest
|
|
|
|
|
import typer
|
|
|
|
|
|
|
|
|
|
from data_designer.cli.controllers.generation_controller import GenerationController
|
|
|
|
|
from data_designer.cli.utils.config_loader import ConfigLoadError
|
|
|
|
|
from data_designer.config.config_builder import DataDesignerConfigBuilder
|
|
|
|
|
from data_designer.config.errors import InvalidConfigError
|
2026-02-19 01:17:03 +00:00
|
|
|
from data_designer.config.utils.constants import DEFAULT_DISPLAY_WIDTH
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
_CTRL = "data_designer.cli.controllers.generation_controller"
|
2026-02-19 01:17:03 +00:00
|
|
|
_DW = DEFAULT_DISPLAY_WIDTH
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
def _make_mock_preview_results(num_records: int) -> MagicMock:
|
|
|
|
|
"""Create a mock PreviewResults with the given number of records."""
|
|
|
|
|
mock_results = MagicMock()
|
|
|
|
|
mock_results.dataset = MagicMock()
|
|
|
|
|
mock_results.dataset.__len__ = MagicMock(return_value=num_records)
|
|
|
|
|
return mock_results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _make_mock_create_results(num_records: int, base_path: str = "/output/artifacts/dataset") -> MagicMock:
|
|
|
|
|
"""Create a mock CreateResults with the given number of records."""
|
|
|
|
|
mock_results = MagicMock()
|
|
|
|
|
mock_dataset = MagicMock()
|
|
|
|
|
mock_dataset.__len__ = MagicMock(return_value=num_records)
|
|
|
|
|
mock_results.load_dataset.return_value = mock_dataset
|
|
|
|
|
mock_results.artifact_storage.base_dataset_path = base_path
|
|
|
|
|
return mock_results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# run_preview tests
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_success(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test successful preview execution in non-interactive mode."""
|
|
|
|
|
mock_builder = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_load_config.return_value = mock_builder
|
|
|
|
|
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.preview.return_value = _make_mock_preview_results(5)
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=5, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
mock_load_config.assert_called_once_with("config.yaml")
|
|
|
|
|
mock_dd_cls.assert_called_once()
|
|
|
|
|
mock_dd.preview.assert_called_once_with(mock_builder, num_records=5)
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_custom_num_records(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test preview with a custom number of records."""
|
|
|
|
|
mock_builder = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_load_config.return_value = mock_builder
|
|
|
|
|
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.preview.return_value = _make_mock_preview_results(20)
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=20, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
mock_dd.preview.assert_called_once_with(mock_builder, num_records=20)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_config_load_error(mock_load_config: MagicMock) -> None:
|
|
|
|
|
"""Test preview exits with code 1 when config fails to load."""
|
|
|
|
|
mock_load_config.side_effect = ConfigLoadError("File not found")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_preview(config_source="missing.yaml", num_records=10, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_generation_fails(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test preview exits with code 1 when generation fails."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.preview.side_effect = RuntimeError("LLM connection failed")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=10, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_no_records_generated(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test preview exits with code 1 when dataset is None."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = MagicMock()
|
|
|
|
|
mock_results.dataset = None
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=10, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_empty_dataset(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test preview exits with code 1 when dataset is empty."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = MagicMock()
|
|
|
|
|
mock_results.dataset = MagicMock()
|
|
|
|
|
mock_results.dataset.__len__ = MagicMock(return_value=0)
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=10, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_non_interactive_displays_all(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test --non-interactive displays all records without interactive browsing."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=3, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 3
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_results.display_sample_record.assert_has_calls(
|
|
|
|
|
[
|
|
|
|
|
call(index=0, display_width=_DW),
|
|
|
|
|
call(index=1, display_width=_DW),
|
|
|
|
|
call(index=2, display_width=_DW),
|
|
|
|
|
]
|
|
|
|
|
)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.sys")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_non_tty_stdin_falls_back_to_non_interactive(
|
|
|
|
|
mock_load_config: MagicMock,
|
|
|
|
|
mock_dd_cls: MagicMock,
|
|
|
|
|
mock_sys: MagicMock,
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test non-TTY stdin auto-detects and falls back to non-interactive mode."""
|
|
|
|
|
mock_sys.stdin.isatty.return_value = False
|
|
|
|
|
mock_sys.stdout.isatty.return_value = True
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=3, non_interactive=False)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.sys")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_piped_stdout_falls_back_to_non_interactive(
|
|
|
|
|
mock_load_config: MagicMock,
|
|
|
|
|
mock_dd_cls: MagicMock,
|
|
|
|
|
mock_sys: MagicMock,
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test piped stdout (e.g. `preview cfg.yaml | head`) falls back to non-interactive."""
|
|
|
|
|
mock_sys.stdin.isatty.return_value = True
|
|
|
|
|
mock_sys.stdout.isatty.return_value = False
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=3, non_interactive=False)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.sys")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_single_record_no_interactive(
|
|
|
|
|
mock_load_config: MagicMock,
|
|
|
|
|
mock_dd_cls: MagicMock,
|
|
|
|
|
mock_sys: MagicMock,
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test single record is displayed directly without interactive prompt."""
|
|
|
|
|
mock_sys.stdin.isatty.return_value = True
|
|
|
|
|
mock_sys.stdout.isatty.return_value = True
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(1)
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=1, non_interactive=False)
|
|
|
|
|
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_results.display_sample_record.assert_called_once_with(index=0, display_width=_DW)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.wait_for_navigation_key", side_effect=["n", "q"])
|
|
|
|
|
@patch(f"{_CTRL}.sys")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_tty_multiple_records_uses_interactive(
|
|
|
|
|
mock_load_config: MagicMock,
|
|
|
|
|
mock_dd_cls: MagicMock,
|
|
|
|
|
mock_sys: MagicMock,
|
|
|
|
|
mock_wait: MagicMock,
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test TTY with multiple records triggers interactive mode."""
|
|
|
|
|
mock_sys.stdin.isatty.return_value = True
|
|
|
|
|
mock_sys.stdout.isatty.return_value = True
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=3, non_interactive=False)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 2
|
|
|
|
|
assert mock_wait.call_count == 2
|
|
|
|
|
|
|
|
|
|
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
@patch(f"{_CTRL}.create_sample_records_pager")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
def test_run_preview_calls_to_report_when_analysis_present(
|
|
|
|
|
mock_load_config: MagicMock, mock_dd_cls: MagicMock, mock_create_pager: MagicMock, tmp_path: Path
|
|
|
|
|
) -> None:
|
2026-02-19 01:17:03 +00:00
|
|
|
"""Test that to_report() is called only for file save (not console) when save_results=True."""
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
mock_analysis = MagicMock()
|
|
|
|
|
mock_results.analysis = mock_analysis
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
controller.run_preview(
|
|
|
|
|
config_source="config.yaml", num_records=3, non_interactive=True, save_results=True, artifact_path=str(tmp_path)
|
|
|
|
|
)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_analysis.to_report.assert_called_once()
|
|
|
|
|
assert mock_analysis.to_report.call_args.kwargs["save_path"].name == "report.html"
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.create_sample_records_pager")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_save_results_creates_directory_structure(
|
|
|
|
|
mock_load_config: MagicMock,
|
|
|
|
|
mock_dd_cls: MagicMock,
|
|
|
|
|
mock_create_pager: MagicMock,
|
|
|
|
|
tmp_path: Path,
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test --save-results saves dataset, report, sample records, and sample_records_browser.html."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(2)
|
|
|
|
|
mock_analysis = MagicMock()
|
|
|
|
|
mock_results.analysis = mock_analysis
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(
|
|
|
|
|
config_source="config.yaml",
|
|
|
|
|
num_records=2,
|
|
|
|
|
non_interactive=True,
|
|
|
|
|
save_results=True,
|
|
|
|
|
artifact_path=str(tmp_path),
|
|
|
|
|
)
|
|
|
|
|
|
2026-02-19 01:17:03 +00:00
|
|
|
# Report saved to file only (no console display when save_results=True)
|
|
|
|
|
mock_analysis.to_report.assert_called_once()
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
report_save_path = mock_analysis.to_report.call_args.kwargs["save_path"]
|
|
|
|
|
assert report_save_path.parent.parent == tmp_path
|
|
|
|
|
assert report_save_path.name == "report.html"
|
|
|
|
|
|
|
|
|
|
# Dataset saved as parquet
|
|
|
|
|
mock_results.dataset.to_parquet.assert_called_once()
|
|
|
|
|
parquet_path = mock_results.dataset.to_parquet.call_args[0][0]
|
|
|
|
|
assert parquet_path.name == "dataset.parquet"
|
|
|
|
|
assert parquet_path.parent == report_save_path.parent
|
|
|
|
|
|
2026-02-19 01:17:03 +00:00
|
|
|
assert mock_results.display_sample_record.call_count == 2
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
sample_records_dir = report_save_path.parent / "sample_records"
|
|
|
|
|
for i in range(2):
|
|
|
|
|
mock_results.display_sample_record.assert_any_call(
|
|
|
|
|
index=i, save_path=sample_records_dir / f"record_{i}.html", theme="dark", display_width=110
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# Sample records browser (pager) generated
|
|
|
|
|
pager_kwargs = mock_create_pager.call_args.kwargs
|
|
|
|
|
assert pager_kwargs["sample_records_dir"] == sample_records_dir
|
|
|
|
|
assert pager_kwargs["num_records"] == 2
|
|
|
|
|
assert "num_columns" in pager_kwargs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.create_sample_records_pager")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_save_results_default_artifact_path(
|
|
|
|
|
mock_load_config: MagicMock, mock_dd_cls: MagicMock, mock_create_pager: MagicMock
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test --save-results with no artifact_path defaults to ./artifacts."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(1)
|
|
|
|
|
mock_analysis = MagicMock()
|
|
|
|
|
mock_results.analysis = mock_analysis
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with patch.object(Path, "mkdir"):
|
|
|
|
|
controller.run_preview(
|
|
|
|
|
config_source="config.yaml",
|
|
|
|
|
num_records=1,
|
|
|
|
|
non_interactive=True,
|
|
|
|
|
save_results=True,
|
|
|
|
|
)
|
|
|
|
|
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_analysis.to_report.assert_called_once()
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
report_save_path = mock_analysis.to_report.call_args.kwargs["save_path"]
|
|
|
|
|
assert report_save_path.parent.parent == Path.cwd() / "artifacts"
|
|
|
|
|
mock_create_pager.assert_called_once()
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_skips_report_when_analysis_is_none(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test that to_report() is not called when analysis is None."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
mock_results.analysis = None
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
# Implicit assertion: analysis is None (not a mock), so the code must not call
|
|
|
|
|
# None.to_report(). If it does, an AttributeError propagates and the test fails.
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
controller.run_preview(config_source="config.yaml", num_records=3, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
@patch(f"{_CTRL}.create_sample_records_pager")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_save_results_without_analysis(
|
|
|
|
|
mock_load_config: MagicMock, mock_dd_cls: MagicMock, mock_create_pager: MagicMock, tmp_path: Path
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test --save-results saves dataset and sample records even when analysis is None."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(2)
|
|
|
|
|
mock_results.analysis = None
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(
|
|
|
|
|
config_source="config.yaml",
|
|
|
|
|
num_records=2,
|
|
|
|
|
non_interactive=True,
|
|
|
|
|
save_results=True,
|
|
|
|
|
artifact_path=str(tmp_path),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
mock_results.dataset.to_parquet.assert_called_once()
|
|
|
|
|
save_path_calls = [c for c in mock_results.display_sample_record.call_args_list if "save_path" in c.kwargs]
|
|
|
|
|
assert len(save_path_calls) == 2
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_no_save_when_save_results_false(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test that dataset and sample records are not saved when save_results=False."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_preview(config_source="config.yaml", num_records=3, non_interactive=True)
|
|
|
|
|
|
|
|
|
|
mock_results.dataset.to_parquet.assert_not_called()
|
|
|
|
|
for c in mock_results.display_sample_record.call_args_list:
|
|
|
|
|
assert "save_path" not in c.kwargs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.create_sample_records_pager")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_save_results_oserror_exits(
|
|
|
|
|
mock_load_config: MagicMock, mock_dd_cls: MagicMock, mock_create_pager: MagicMock, tmp_path: Path
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test --save-results exits with code 1 when an OSError occurs."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(2)
|
|
|
|
|
mock_results.analysis = None
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
mock_results.dataset.to_parquet.side_effect = OSError("Disk full")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_preview(
|
|
|
|
|
config_source="config.yaml",
|
|
|
|
|
num_records=2,
|
|
|
|
|
non_interactive=True,
|
|
|
|
|
save_results=True,
|
|
|
|
|
artifact_path=str(tmp_path),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.create_sample_records_pager")
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_preview_save_results_non_oserror_propagates(
|
|
|
|
|
mock_load_config: MagicMock, mock_dd_cls: MagicMock, mock_create_pager: MagicMock, tmp_path: Path
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Test --save-results lets non-OSError exceptions propagate."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_preview_results(2)
|
|
|
|
|
mock_results.analysis = None
|
|
|
|
|
mock_dd.preview.return_value = mock_results
|
|
|
|
|
mock_results.dataset.to_parquet.side_effect = ValueError("Unexpected error")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(ValueError, match="Unexpected error"):
|
|
|
|
|
controller.run_preview(
|
|
|
|
|
config_source="config.yaml",
|
|
|
|
|
num_records=2,
|
|
|
|
|
non_interactive=True,
|
|
|
|
|
save_results=True,
|
|
|
|
|
artifact_path=str(tmp_path),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# _browse_records_interactively unit tests
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.wait_for_navigation_key", side_effect=["n", "n", "q"])
|
|
|
|
|
def test_browse_interactively_next_advances(mock_wait: MagicMock) -> None:
|
|
|
|
|
"""Test pressing n/enter advances to the next record."""
|
|
|
|
|
mock_results = _make_mock_preview_results(5)
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
|
|
|
|
|
controller._browse_records_interactively(mock_results, 5)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 3
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_results.display_sample_record.assert_has_calls(
|
|
|
|
|
[
|
|
|
|
|
call(index=0, display_width=_DW),
|
|
|
|
|
call(index=1, display_width=_DW),
|
|
|
|
|
call(index=2, display_width=_DW),
|
|
|
|
|
]
|
|
|
|
|
)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.wait_for_navigation_key", side_effect=["q"])
|
|
|
|
|
def test_browse_interactively_quit_immediately(mock_wait: MagicMock) -> None:
|
|
|
|
|
"""Test pressing 'q' quits after showing only the first record."""
|
|
|
|
|
mock_results = _make_mock_preview_results(5)
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
|
|
|
|
|
controller._browse_records_interactively(mock_results, 5)
|
|
|
|
|
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_results.display_sample_record.assert_called_once_with(index=0, display_width=_DW)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.wait_for_navigation_key", side_effect=["n", "p", "q"])
|
|
|
|
|
def test_browse_interactively_previous(mock_wait: MagicMock) -> None:
|
|
|
|
|
"""Test 'p' navigates to the previous record."""
|
|
|
|
|
mock_results = _make_mock_preview_results(5)
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
|
|
|
|
|
controller._browse_records_interactively(mock_results, 5)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 3
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_results.display_sample_record.assert_has_calls(
|
|
|
|
|
[
|
|
|
|
|
call(index=0, display_width=_DW),
|
|
|
|
|
call(index=1, display_width=_DW),
|
|
|
|
|
call(index=0, display_width=_DW),
|
|
|
|
|
]
|
|
|
|
|
)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.wait_for_navigation_key", side_effect=["p", "q"])
|
|
|
|
|
def test_browse_interactively_previous_wraps_to_last(mock_wait: MagicMock) -> None:
|
|
|
|
|
"""Test 'p' on the first record wraps to the last record."""
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
|
|
|
|
|
controller._browse_records_interactively(mock_results, 3)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 2
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_results.display_sample_record.assert_has_calls(
|
|
|
|
|
[
|
|
|
|
|
call(index=0, display_width=_DW),
|
|
|
|
|
call(index=2, display_width=_DW),
|
|
|
|
|
]
|
|
|
|
|
)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.wait_for_navigation_key", side_effect=["n", "n", "n", "q"])
|
|
|
|
|
def test_browse_interactively_next_wraps_past_last(mock_wait: MagicMock) -> None:
|
|
|
|
|
"""Test n past the last record wraps back to the first."""
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
|
|
|
|
|
controller._browse_records_interactively(mock_results, 3)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 4
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_results.display_sample_record.assert_has_calls(
|
|
|
|
|
[
|
|
|
|
|
call(index=0, display_width=_DW),
|
|
|
|
|
call(index=1, display_width=_DW),
|
|
|
|
|
call(index=2, display_width=_DW),
|
|
|
|
|
call(index=0, display_width=_DW),
|
|
|
|
|
]
|
|
|
|
|
)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# _display_all_records unit test
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_display_all_records() -> None:
|
|
|
|
|
"""Test _display_all_records displays every record."""
|
|
|
|
|
mock_results = _make_mock_preview_results(3)
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
|
|
|
|
|
controller._display_all_records(mock_results, 3)
|
|
|
|
|
|
|
|
|
|
assert mock_results.display_sample_record.call_count == 3
|
2026-02-19 01:17:03 +00:00
|
|
|
mock_results.display_sample_record.assert_has_calls(
|
|
|
|
|
[
|
|
|
|
|
call(index=0, display_width=_DW),
|
|
|
|
|
call(index=1, display_width=_DW),
|
|
|
|
|
call(index=2, display_width=_DW),
|
|
|
|
|
]
|
|
|
|
|
)
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# run_validate tests
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_validate_success(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test successful validate execution."""
|
|
|
|
|
mock_builder = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_load_config.return_value = mock_builder
|
|
|
|
|
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.validate.return_value = None
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_validate(config_source="config.yaml")
|
|
|
|
|
|
|
|
|
|
mock_load_config.assert_called_once_with("config.yaml")
|
|
|
|
|
mock_dd_cls.assert_called_once()
|
|
|
|
|
mock_dd.validate.assert_called_once_with(mock_builder)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_validate_config_load_error(mock_load_config: MagicMock) -> None:
|
|
|
|
|
"""Test validate exits with code 1 when config fails to load."""
|
|
|
|
|
mock_load_config.side_effect = ConfigLoadError("File not found")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_validate(config_source="missing.yaml")
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_validate_invalid_config(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test validate exits with code 1 when config is invalid."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.validate.side_effect = InvalidConfigError("Missing required column")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_validate(config_source="config.yaml")
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_validate_generic_exception(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test validate exits with code 1 on unexpected errors."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.validate.side_effect = RuntimeError("Unexpected error")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_validate(config_source="config.yaml")
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# run_create tests
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_create_success(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test successful create execution with default artifact path."""
|
|
|
|
|
mock_builder = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_load_config.return_value = mock_builder
|
|
|
|
|
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.create.return_value = _make_mock_create_results(10)
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_create(config_source="config.yaml", num_records=10, dataset_name="dataset", artifact_path=None)
|
|
|
|
|
|
|
|
|
|
mock_load_config.assert_called_once_with("config.yaml")
|
|
|
|
|
mock_dd_cls.assert_called_once_with(artifact_path=Path.cwd() / "artifacts")
|
|
|
|
|
mock_dd.create.assert_called_once_with(mock_builder, num_records=10, dataset_name="dataset")
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_create_custom_options(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test create with custom --num-records, --dataset-name, and --artifact-path."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.create.return_value = _make_mock_create_results(100, "/custom/output/my_data")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_create(
|
|
|
|
|
config_source="config.py",
|
|
|
|
|
num_records=100,
|
|
|
|
|
dataset_name="my_data",
|
|
|
|
|
artifact_path="/custom/output",
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
mock_dd_cls.assert_called_once_with(artifact_path=Path("/custom/output"))
|
|
|
|
|
mock_dd.create.assert_called_once_with(mock_load_config.return_value, num_records=100, dataset_name="my_data")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_create_config_load_error(mock_load_config: MagicMock) -> None:
|
|
|
|
|
"""Test create exits with code 1 when config fails to load."""
|
|
|
|
|
mock_load_config.side_effect = ConfigLoadError("File not found")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_create(config_source="missing.yaml", num_records=10, dataset_name="dataset", artifact_path=None)
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_create_creation_fails(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test create exits with code 1 when dataset creation fails."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_dd.create.side_effect = RuntimeError("LLM connection failed")
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
with pytest.raises(typer.Exit) as exc_info:
|
|
|
|
|
controller.run_create(config_source="config.yaml", num_records=10, dataset_name="dataset", artifact_path=None)
|
|
|
|
|
|
|
|
|
|
assert exc_info.value.exit_code == 1
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_create_calls_to_report_when_analysis_present(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test that analysis.to_report() is called when load_analysis() returns a value."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_create_results(10)
|
|
|
|
|
mock_analysis = MagicMock()
|
|
|
|
|
mock_results.load_analysis.return_value = mock_analysis
|
|
|
|
|
mock_dd.create.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_create(config_source="config.yaml", num_records=10, dataset_name="dataset", artifact_path=None)
|
|
|
|
|
|
|
|
|
|
mock_results.load_analysis.assert_called_once()
|
|
|
|
|
mock_analysis.to_report.assert_called_once()
|
|
|
|
|
|
|
|
|
|
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time
Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.
Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations
Reduces CLI import-time from ~1.67s to ~0.46s.
* perf: defer pandas/numpy in io_helpers and add config_list benchmark
- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
with module-level __getattr__ (for backwards-compatible external
access / test mocks) and function-level imports in the 3 functions
that actually use them (read_parquet_dataset, smart_load_dataframe,
_convert_to_serializable). Importing io_helpers no longer triggers
pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
bodies to avoid loading repositories, Rich, and prompt_toolkit at
module import time.
- Add `config_list` (data-designer config list) measurement to the
CLI startup benchmark with isolated cold measurement in a separate
venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.
* Refine lazy import usage and TYPE_CHECKING cleanup
* Run license header updater on PR-touched files
* fix: update sqlfluff mock target for lazy imports in test_sql
* perf: cache globals() in lazy __getattr__ to avoid repeated lookups
Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.
* perf: lazy CLI command loading and deferred heavy import evaluations
- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files
- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes
- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks
- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use
- Update test mock targets to patch at usage-site for module-level imports
* refactor: use direct pandas import in seed_source_dataframe
Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.
* update lazy import pattern
* update tests to use lazy import namespace
Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.
* tighten import perf test thresholds
Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.
* document pandas import requirement
Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.
* increase timeout time
* use lazy pandas imports in visualization tests
- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted
* fix lazy pandas runtime usage and preview mocks
Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 21:24:15 +00:00
|
|
|
@patch(f"{_CTRL}.DataDesigner")
|
feat: add preview, create, and validate CLI commands (#313)
* feat: add preview, create, and validate CLI commands
Add three new top-level CLI commands for the data-designer workflow:
- `data-designer preview` - generate preview datasets for fast iteration
- `data-designer create` - create full datasets and save to disk
- `data-designer validate` - validate configuration files
Also includes:
- Move wait_for_navigation_key() UI primitive from preview.py to ui.py
- Add KeyPressEvent type annotations to all key binding handlers in ui.py
- Refactor cli/utils.py into cli/utils/ package with config_loader module
- Comprehensive test coverage for all new commands
* fix: update pythonjsonlogger import and clean up dev dependencies
- Update pythonjsonlogger import to use newer JsonFormatter API
- Consolidate dev-dependencies into [dependency-groups] dev section
- Remove unnecessary test cli/utils __init__.py
* small E
* address greptile feedback
* organize CLI commands into rich help panels
Group top-level commands under "Generation" and "Setup" panels
for clearer help output.
* refactor config loader to parse files directly and auto-detect config format
- Parse YAML/JSON files into dicts before passing to from_config,
providing format-specific error messages for parse failures
- Auto-detect DataDesignerConfig format (columns at top level) and
wrap it into BuilderConfig so users can provide either format
- Clean up Python module loading with try/except/finally for reliable
sys.modules and sys.path cleanup
- Add comprehensive tests for parsing, validation, and auto-wrapping
* fix sys.path cleanup in config loader and simplify tests
- Use pop(0) instead of remove() to precisely undo the insert(0, ...)
and avoid accidentally removing a different matching path entry
- Replace MagicMock with real DataDesignerConfigBuilder in tests
* move config format auto-detection into from_config
Centralize the shorthand DataDesignerConfig detection (columns at
top level without a data_designer wrapper) in
DataDesignerConfigBuilder.from_config so all callers benefit, not
just the CLI config loader. Simplify config_loader to delegate file
parsing and format normalization entirely to from_config.
* extract GenerationController from CLI commands
Move shared generation logic (preview, validate, create) out of the
individual Typer command functions into a dedicated GenerationController,
matching the existing controller pattern (DownloadController, etc.).
The command functions now delegate to the controller, keeping them as
thin entry points. Tests updated accordingly — command tests verify
delegation while controller tests cover the full behavior.
* harden sys.path cleanup and add explanatory comments
Use sys.path.remove() instead of checking sys.path[0] so cleanup
succeeds even when exec_module inserts entries at index 0. Drop
unnecessary spec=DataDesignerConfigBuilder from test mocks.
* check stdout TTY in preview interactive mode detection
Previously only stdin was checked, so piping stdout (e.g.
`dd preview cfg.yaml | head`) would still attempt interactive
browsing. Now both stdin and stdout must be a TTY.
2026-02-11 19:06:06 +00:00
|
|
|
@patch(f"{_CTRL}.load_config_builder")
|
|
|
|
|
def test_run_create_skips_report_when_analysis_is_none(mock_load_config: MagicMock, mock_dd_cls: MagicMock) -> None:
|
|
|
|
|
"""Test that to_report() is not called when load_analysis() returns None."""
|
|
|
|
|
mock_load_config.return_value = MagicMock(spec=DataDesignerConfigBuilder)
|
|
|
|
|
mock_dd = MagicMock()
|
|
|
|
|
mock_dd_cls.return_value = mock_dd
|
|
|
|
|
mock_results = _make_mock_create_results(10)
|
|
|
|
|
mock_results.load_analysis.return_value = None
|
|
|
|
|
mock_dd.create.return_value = mock_results
|
|
|
|
|
|
|
|
|
|
controller = GenerationController()
|
|
|
|
|
controller.run_create(config_source="config.yaml", num_records=10, dataset_name="dataset", artifact_path=None)
|
feat: add --save-results option to preview command (#333)
* feat: add --save-report option to preview command
* feat: add save_path option to display_sample_record
Allow saving rendered sample records as HTML or SVG files via an
optional save_path parameter on both the standalone function and
the WithRecordSamplerMixin method.
* feat: replace --save-report with --save-results on preview command
Replace the single-file --save-report option with --save-results, which saves all preview artifacts (dataset parquet, analysis report HTML, and per-record sample HTMLs) into a timestamped directory under the artifact path. Add error handling around the save block, improve timestamp precision to microseconds, and expand test coverage for the new behavior.
* feat: add sample records pager with theme toggle, postMessage bridge, and UI polish
* feat: add dataset metadata subtitle to pager and clean up toolbar layout
* fix: address review findings for preview save-results feature
- Split try/except in generation_controller so report display errors
don't produce misleading "failed to save" messages when not saving
- Add browser HTML path to save success output for discoverability
- Remove 5 unused CSS variables from pager theme constants
- Add "N of M" record counter to pager toolbar
- Add theme/display_width assertions to all preview_command tests
- Add dedicated test for custom theme and display_width passthrough
- Add tests for record counter and CSS variable cleanup
* fix: address code review findings and simplify pager
- Fix critical bug: analysis report now displays to console even when
--save-results is active (was silently dropped via pass statement)
- Fix latent UnboundLocalError in display_sample_record when index is
out of bounds (num_records computed before try block)
- Eliminate duplicated dark CSS between constant and theme listener script
- Simplify sample_records_pager: remove dual-theme system, postMessage
bridge, and responsive media queries; restore GitHub link; reorder
toolbar to put prev/next buttons on the far left
- Narrow except Exception to except OSError in save-results path
- Use case-insensitive extension check and lambda-based re.sub
- Collapse redundant preview command delegation tests into parametrize
- Add missing type annotations and remove tautological assertions
* style: move record counter to far right of pager toolbar
* refactor: remove dead theme-listener script and inline CSS constant
_THEME_LISTENER_SCRIPT and _SAMPLE_RECORD_DARK_CSS_INLINE became
orphaned after the pager simplification removed the postMessage
bridge. This removes both constants, drops the injection line,
switches the idempotency guard to the viewport meta tag, and
cleans up related test assertions.
* fix: move Path import out of TYPE_CHECKING block in test_visualization
* fix: rename _logger to logger to match codebase convention
* fix: remove unnecessary cast in preview command theme parameter
* refactor: extract DEFAULT_DISPLAY_WIDTH constant and make apply_html_post_processing public
* Update packages/data-designer-config/tests/config/utils/test_visualization.py
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-18 20:58:35 +00:00
|
|
|
|
|
|
|
|
# load_analysis() returns None, so to_report() must not be called.
|
|
|
|
|
# If the code ignores the None check, an AttributeError propagates and the test fails.
|
|
|
|
|
mock_results.load_analysis.assert_called_once()
|