Commit graph

21 commits

Author SHA1 Message Date
Johnny Greco
1439bbea7e
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time

Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.

Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations

Reduces CLI import-time from ~1.67s to ~0.46s.

* perf: defer pandas/numpy in io_helpers and add config_list benchmark

- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
  with module-level __getattr__ (for backwards-compatible external
  access / test mocks) and function-level imports in the 3 functions
  that actually use them (read_parquet_dataset, smart_load_dataframe,
  _convert_to_serializable). Importing io_helpers no longer triggers
  pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
  bodies to avoid loading repositories, Rich, and prompt_toolkit at
  module import time.
- Add `config_list` (data-designer config list) measurement to the
  CLI startup benchmark with isolated cold measurement in a separate
  venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.

* Refine lazy import usage and TYPE_CHECKING cleanup

* Run license header updater on PR-touched files

* fix: update sqlfluff mock target for lazy imports in test_sql

* perf: cache globals() in lazy __getattr__ to avoid repeated lookups

Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.

* perf: lazy CLI command loading and deferred heavy import evaluations

- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files

- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes

- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks

- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use

- Update test mock targets to patch at usage-site for module-level imports

* refactor: use direct pandas import in seed_source_dataframe

Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.

* update lazy import pattern

* update tests to use lazy import namespace

Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.

* tighten import perf test thresholds

Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.

* document pandas import requirement

Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.

* increase timeout time

* use lazy pandas imports in visualization tests

- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted

* fix lazy pandas runtime usage and preview mocks

Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 16:24:15 -05:00
Eric W. Tramel
8a28232640
feat(engine): env-var switch for async-first models experiment (#280) 2026-02-13 17:28:35 -05:00
Andre Manoel
58734d09f0
test: add provider health checks script and CI workflow (#301)
* test: add e2e health checks for default provider models

Add parametrized tests that verify model connectivity for all
default providers (nvidia, openai, openrouter). Tests check API
key availability and skip when not configured.

* chore: move health checks out of e2e tests

- Convert pytest test to standalone script at scripts/health_checks.py
- Add `make health-checks` target
- Add CI workflow (weekly + on release + manual dispatch)
- Remove test_health_checks.py from tests_e2e/

* chore: make health checks non-blocking in CI

* fix: print traceback to stdout to avoid interleaving

* chore: add all provider API keys to health checks CI

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: remove temporary push trigger from health checks

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-06 15:18:35 -03:00
Johnny Greco
3045208599
fix: normalize license header year format in mcp module (#279)
* fix: normalize license header year format in mcp module

* existing header dates are authoritative
2026-02-02 10:56:35 -05:00
Eric W. Tramel
e6e58e692e
feat: MCP (Model Context Protocol) tool calling integration for LLM columns (#248) 2026-02-02 09:41:58 -05:00
Johnny Greco
63c8dcc11d
chore: simplify publish script by removing redundant rebuild step (#268)
- Remove rebuild_with_tag() function that caused double builds
- Add dedicated delete_local_tag() function for TestPyPI cleanup
- Production workflow now builds once: create local tag -> build -> upload -> push tag
- Tag is only pushed after successful upload, so local tag can be deleted if build fails
2026-01-29 11:55:08 -05:00
Andre Manoel
e46fbd0759
fix: automate README sync for data-designer package builds (#266)
* fix: uv sync or build requires copying README

* update header (script doesn't check it)

* changing path, ensuring proper checks
2026-01-29 13:10:26 -03:00
Johnny Greco
c19f35639f
chore: add publish script and update license headers (#253) 2026-01-28 08:47:34 -05:00
Johnny Greco
ae0665fa16
refactor: slim package refactor into three subpackages (#240)
* remove old structure

* major shuffle

* streamline project configs

* update make commands

* updates to make commands

* remove essentials

* initialize logger in interface

* uv lock

* ignore notepad

* update workflows

* fix e2e project config

* generate colab notebooks

* resolve default model settings in interface

* fix build commands

* update perf import make command

* cleaning up some slop

* update recipes

* move conftest files to tests/

* update subpackage readmes

* streamline config_logging

* use exports

* update perf import usage pattern

* update for IDE behavior with ruff

* remove engine's fixtures file

* add note to about lazy imports

* update dependencies

* update docs

* doc fixes

* uv lock

* updates to catch up with main

* clean up makefile

* remove package gitignores

* define deps only once

* isolate tests

* add test for protetion rule

* create temp dirs for isolated tests

* catch up to main

* update headers

* re apply changes

* better result summaries for isolated tests

* move exports into top-level init

* fix client importlib version syntax

* catch up with main
2026-01-27 13:53:20 -05:00
Johnny Greco
367de1a063
rename (#214) 2026-01-14 15:26:46 -05:00
Johnny Greco
f8c201e085
chore: update header script to check for diffs (#195)
* update script

* update headers

* refactor a bit and add test script

* update headers

* update for edge case

* update headers

* add step to get file creation date

* use git history to get copyright year

* generation type is printed with inference parameters

* fix unit test
2026-01-09 17:10:58 -05:00
Mike Knepper
2cfff52581
feat: Seed reader plugins (#191) 2026-01-09 13:50:47 -06:00
Mike Knepper
32515ba724
style: Sort imports traditionally instead of within sections (#103) 2025-12-08 09:01:58 -06:00
Johnny Greco
42b089e0f4
docs: establish doc templating, building, and strategy (#31)
* initial updates with jupyter tutorials and styling

* filling out some docs

* add blank index

* update docs workflow

* clean up style sheet
2025-11-12 17:04:50 -05:00
Andre Manoel
c0b6ddc145 added headers 2025-11-03 13:48:41 -03:00
Andre Manoel
90d3258dc1 linting etc 2025-11-03 13:48:41 -03:00
Andre Manoel
c1fdd4c15d first test 2025-11-03 13:48:41 -03:00
Johnny Greco
ab65070d9e skip autogenerated version file 2025-10-28 14:17:52 -04:00
Johnny Greco
cde4f33ae4 add check headers option 2025-10-27 19:14:52 -04:00
Johnny Greco
6d9836e2ee add and run pre-commit 2025-10-27 18:10:36 -04:00
Johnny Greco
7ed5e78741 initial port 2025-10-27 14:29:12 -04:00