Commit graph

6 commits

Author SHA1 Message Date
Eric W. Tramel
8be4ff787f
feat: add RunConfig jinja rendering engine (#557) 2026-04-17 15:06:27 -04:00
Johnny Greco
4a2813654a
fix: always return ISO-8601 from datetime postproc (#484) (#512)
* fix: always return ISO-8601 from datetime postproc (#484)

The DatetimeFormatMixin.postproc heuristics inferred output format from
value distribution, silently stripping date/time components for small
datasets or narrow date ranges. Replace with deterministic ISO-8601
output via vectorized strftime. Users who need custom formats can still
set convert_to on the SamplerColumnConfig.

* docs: update convert_to docstring and add DatetimeFormatMixin docstring

The SamplerColumnConfig.convert_to docstring incorrectly stated that
only "float", "int", or "str" are accepted. Datetime/timedelta samplers
accept strftime format strings. Also document the ISO-8601 default.

* test: add regression test for #484 via DataDesigner.preview API

Captures the exact reproducer from the issue: a single-record datetime
preview through the public DataDesigner.preview() interface must return
a full ISO-8601 timestamp, not a bare year string.

* test: trim redundant datetime tests, align reproducer with issue #484

- Remove postproc_same_day_records (subsumed by same_month + no_convert_to)
- Remove postproc_always_parseable (subsumed by stdlib_fromisoformat)
- Remove all_same_month integration test (subsumed by narrow_range_single_day)
- Update single_record test to use unit="h" matching the issue reproducer

* fix: address review nits — move datetime import to module scope, drop redundant isinstance
2026-04-09 12:50:40 -04:00
Johnny Greco
26a9cf23ac
feat: normalize validator and constraint discriminators (#414)
* feat: normalize validator and constraint discriminators

* docs: add docstring and comment to Constraint base class

Address Greptile review feedback:
- Add docstring to Constraint noting it should not be instantiated directly
- Add comment explaining the rhs fallback behavior in the resolver

* refactor: restore ABC on Constraint base class

* refactor: add explicit None guard in constraint resolver

* Fix legacy numeric sampler constraint detection

* fix: address PR review feedback from nabinchha

- Guard _can_coerce_to_float against inf/nan strings
- Add -> None return type annotations to test functions
- Add clarifying comments to ColumnConstraintT vs ColumnConstraintInputT
- Add tests for tagged constraint round-trip and missing rhs validation
2026-03-13 17:34:23 -04:00
Johnny Greco
1439bbea7e
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time

Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.

Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations

Reduces CLI import-time from ~1.67s to ~0.46s.

* perf: defer pandas/numpy in io_helpers and add config_list benchmark

- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
  with module-level __getattr__ (for backwards-compatible external
  access / test mocks) and function-level imports in the 3 functions
  that actually use them (read_parquet_dataset, smart_load_dataframe,
  _convert_to_serializable). Importing io_helpers no longer triggers
  pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
  bodies to avoid loading repositories, Rich, and prompt_toolkit at
  module import time.
- Add `config_list` (data-designer config list) measurement to the
  CLI startup benchmark with isolated cold measurement in a separate
  venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.

* Refine lazy import usage and TYPE_CHECKING cleanup

* Run license header updater on PR-touched files

* fix: update sqlfluff mock target for lazy imports in test_sql

* perf: cache globals() in lazy __getattr__ to avoid repeated lookups

Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.

* perf: lazy CLI command loading and deferred heavy import evaluations

- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files

- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes

- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks

- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use

- Update test mock targets to patch at usage-site for module-level imports

* refactor: use direct pandas import in seed_source_dataframe

Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.

* update lazy import pattern

* update tests to use lazy import namespace

Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.

* tighten import perf test thresholds

Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.

* document pandas import requirement

Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.

* increase timeout time

* use lazy pandas imports in visualization tests

- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted

* fix lazy pandas runtime usage and preview mocks

Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 16:24:15 -05:00
Johnny Greco
c19f35639f
chore: add publish script and update license headers (#253) 2026-01-28 08:47:34 -05:00
Johnny Greco
ae0665fa16
refactor: slim package refactor into three subpackages (#240)
* remove old structure

* major shuffle

* streamline project configs

* update make commands

* updates to make commands

* remove essentials

* initialize logger in interface

* uv lock

* ignore notepad

* update workflows

* fix e2e project config

* generate colab notebooks

* resolve default model settings in interface

* fix build commands

* update perf import make command

* cleaning up some slop

* update recipes

* move conftest files to tests/

* update subpackage readmes

* streamline config_logging

* use exports

* update perf import usage pattern

* update for IDE behavior with ruff

* remove engine's fixtures file

* add note to about lazy imports

* update dependencies

* update docs

* doc fixes

* uv lock

* updates to catch up with main

* clean up makefile

* remove package gitignores

* define deps only once

* isolate tests

* add test for protetion rule

* create temp dirs for isolated tests

* catch up to main

* update headers

* re apply changes

* better result summaries for isolated tests

* move exports into top-level init

* fix client importlib version syntax

* catch up with main
2026-01-27 13:53:20 -05:00