DataDesigner

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

Author	SHA1	Message	Date
Eric W. Tramel	be29f69796	refactor: organize engine progress visualization	2026-05-21 15:29:49 -04:00
Eric W. Tramel	461f261db4	feat: add live multi-model traffic demo Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-21 14:20:31 -04:00
Eric W. Tramel	75dcb5e647	fix: split model usage from column progress Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-21 14:13:16 -04:00
Eric W. Tramel	a065f06c43	fix: place progress completion beside row bars Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-21 14:00:49 -04:00
Eric W. Tramel	b54d02ac92	fix: include units in progress rate headers Label the progress legend's now and average record-rate columns as rec/s and refresh the screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-21 13:49:42 -04:00
Eric W. Tramel	3679ad380b	fix: avoid repeated column label in progress legend Show raw column names in the async progress panel legend because the table header already provides the column context, and refresh the PR screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-21 13:48:08 -04:00
Eric W. Tramel	33cf915fea	fix: render progress legend without ascii separators Replace the pipe-delimited progress legend with a native spaced layout, keep column alignment via computed widths, mute the header row, and refresh the PR screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-21 13:46:07 -04:00
Eric W. Tramel	61125a02d0	fix: align progress panel metrics Render the progress legend as a stable table with live token-rate columns, attribute model usage to active generation columns across async bridge boundaries, and cancel the async scheduler cleanly on KeyboardInterrupt. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-21 13:37:53 -04:00
Eric W. Tramel	3047c42ebc	fix: smooth throughput panel updates Throttle active TTY redraws, sample rates over larger windows, smooth and fit chart series, bound rate history, and harden panel tests. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-21 13:05:17 -04:00
Eric W. Tramel	4750dbd717	feat: chart generation throughput Replace sticky progress bars with a bounded ANSI/asciichart throughput panel that plots records per second per generation column. Default progress_bar to enabled and add a local demo config plus screenshot for PR review. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>	2026-05-20 21:55:16 -04:00
Nabin Mulepati	7c5a7221e0	docs: add VLM long-document understanding dev note and recipes (#579 ) * Add resources for long-document-understanding-dev-note * added links	2026-04-28 09:59:03 -06:00
dhruvnathawani	1448f9cbda	docs: add text-to-sql dev note (#349 ) * docs: add text-to-sql devnote * add diagram, update content * correct inconsistencies * docs: address PR #349 feedback and add BIRD benchmark results PR feedback fixes: - Fix Window Functions contradiction: Key Takeaway #1 now uses "Geospatial SQL" (Advanced) instead of "Window Functions" (Intermediate) - Fix score-0 truthiness bug: use `is not none` instead of truthy check in Jinja2 expression columns (inline example + production pipeline) - Soften Code Sandbox language: "A natural next step would be..." instead of "We are actively implementing..." - Cut Gretel reference per mvansegbroeck: replaced with NVIDIA/Nemotron team description - Replace Qwen model references with Nemotron per mvansegbroeck: MODEL_NAME, ASCII diagram labels, Pipeline Overview prose - Rename sdg_qwen_235b.py -> sdg_ndd_text2sql.py per mvansegbroeck - Fix Try It Yourself: use MODEL_ALIAS = "nvidia-text" with default provider pattern (matches structured-outputs dev note), remove unused explicit ModelConfig - Remove placeholder dataset link (#), add "Dataset: Internal" note New content: - Add BIRD Benchmark Results section with bar chart (JPG), data table, BIRD caveat paragraph, and Jocelyn Huang acknowledgement (Nemotron Super EX: 26.77% -> 41.80%, +15 pts, beats GPT-OSS-120B) - Replace "Looking Ahead: Code Sandbox" with broader "Next Steps": Code Sandbox, RL on BIRD via NeMo Gym, schema representation, Spider 2.0 - Add Project Summary table at end of post * docs: address second round of PR #349 feedback - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying code snippets are illustrative, not runnable, with link to Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Add companion file note and recipe link to production pipeline details block for prompts.py, rubrics.py, text2sql_seed.json (nabinchha) * docs: address round 2 PR #349 feedback, replace production block with recipe - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying inline code snippets are illustrative, with link to runnable Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Replace production pipeline <details> block (230 lines with phantom imports from prompts.py, rubrics.py, text2sql_seed.json) with snippet include of enterprise_text_to_sql.py recipe — self-contained and runnable, consistent with other merged dev notes (nabinchha) * docs: polish Try It Yourself and Summary sections - Wrap minimal inline example in collapsible <details> dropdown - Rename "A Team Effort" section to "Summary" - Remove redundant Scale/Dialects/Dataset line * docs: add missing sql_dialect sampler to Step 1 code snippet The Step 3/4 prompt templates reference {{ sql_dialect }} but the Step 1 seeding code never defined it, leaving an unresolved Jinja2 variable for readers following along. Add the sql_dialect sampler with a comment explaining the pipeline runs once per dialect. * fix ascii diagram * docs: fix BIRD score framing and MySQL dialect wording - Remove specific "60-70%" BIRD claim from intro to avoid contradiction with the 41.80%/38.25% direct-generation results shown later (those higher figures come from specialized systems with schema linking) - Reword MySQL "forbids" to "prompts exclude" -- REGEXP_REPLACE and CONVERT_TZ are valid MySQL functions; the pipeline excluded them for portability, not because the dialect forbids them * docs: move text-to-sql images to assets/ convention and update refs * docs: address text-to-sql devnote review comments - Add devnote to mkdocs nav after Async All the Way Down - Swap Recursive CTEs to Advanced, CASE Expressions to Intermediate (matches recipe) - Fix score extraction truthy check to use 'is not none' (preserves score-0 values) - Drop REPLACE() vs regexp_replace from dialect takeaway (REPLACE is cross-dialect) - Tighten prose: remove 'The key insight:', use actual BIRD number, trim X-not-Y - Fix knowledge dependency count: 8 -> 9 concepts (3x3 in recipe) --------- Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Co-authored-by: Yev Meyer <ymeyer@nvidia.com>	2026-04-14 11:10:14 -07:00
Eric W. Tramel	7891dd53cb	feat: add Hermes Agent rollout support (#500 )	2026-04-07 12:39:49 -04:00
Eric W. Tramel	58870bb83f	feat: add ATIF rollout ingestion (#495 )	2026-04-06 11:06:14 -04:00
Eric W. Tramel	116184b5e6	docs: consolidated seed reader documentation for filesystem and agent rollout sources (#481 ) Add comprehensive documentation for DirectorySeedSource, FileContentsSeedSource, and AgentRolloutSeedSource to the seed datasets concept page. Add FileSystemSeedReader plugin authoring guide and Markdown section seed reader recipe. Supersedes #425 and #452. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 13:31:42 -04:00
Eric W. Tramel	a0fb04ee07	feat: agent rollout trace ingestion (#399 )	2026-03-20 09:17:35 -04:00
dhruvnathawani	7de879acfa	docs: add Enterprise Text-to-SQL and Search Agent recipes (#395 ) feat: add Nemotron Super Text-to-SQL and Search Agent recipes Add two new recipes derived from the Nemotron Super post-training pipelines: Nemotron Super Text-to-SQL: - Five-stage pipeline: seeding, prompt generation, schema with distractors, dialect-specific SQL, validation + quality scoring - 14 conditional samplers (10 industries, 50 topics, complexity-gated task types, data quality concepts, knowledge dependencies, 100 style combos) - Dialect-specific prompts for SQLite, MySQL, and PostgreSQL - 5 LLM judges (prompt, SQL, context, data quality, knowledge) with 15 scoring dimensions and flat score extraction columns - Per-dialect syntax validation via CodeValidatorParams Nemotron Super Search Agent: - Four-stage pipeline: Wikidata KG seed paths, two-stage riddle generation (draft + BrowseComp-style obfuscation), Tavily web search trajectories via MCP, structured JSON formatting - Tavily hosted MCP endpoint (streamable_http) -- no local server or extra dependencies beyond data-designer - Full tool-call trace capture (with_trace=ALL_MESSAGES) for SFT data - Built-in demo seeds (3 Wikidata paths) for quick testing Both recipes include ASCII pipeline diagrams, Nemotron Super context in docstrings, dev note links in the markdown pages, and follow existing recipe conventions (PEP 723 metadata, --model-alias/--num-records/ --artifact-path CLI args).	2026-03-11 11:19:58 -07:00
Johnny Greco	f74f25872c	chore: quiet tool call logs and add tool usage statistics (#293 ) * add tool usage statistics tracking - Add ToolUsageStats class with metrics for tool calls, turns, and statistical aggregates (mean/stddev per generation) - Extend ModelUsageStats to include tool_usage tracking - Update ModelFacade.generate() to track total tool calls and turns - Update tests with tool_call_count method and new assertions * silence noisy mcp logs * log message updates * add tools enabled info message * exclude empty tool_usage from usage stats output * add tool usage summary logging after column generation - Track tool usage snapshots before/after column processing - Log mean tool calls per generation for columns with tools enabled - Add get_tool_usage_snapshot/get_tool_usage_delta methods to ModelRegistry - Remove unused extra_info parameter from progress_tracker.log_start() - Add comprehensive tests for ToolUsageStats * pretty format model usage logs * reuse stubs and fixtures * add merge method to ToolUsageStats for accurate stats aggregation The previous implementation used extend() to combine tool usage stats, but extend() is designed for single generation data. This caused incorrect stddev calculations when merging stats from multiple sources. - Add ToolUsageStats.merge() that properly combines sum-of-squares - Update ModelUsageStats.extend() to use merge() for tool usage - Add tests verifying stddev accuracy after merging * fix tool usage stats missing generations_with_tools count When tracking tool usage after generation, the ToolUsageStats was created without setting generations_with_tools, causing the usage summary to report zeros for calls/gen and turns/gen metrics. * fix tool usage delta objects returning incorrect stddev values - Simplify facade API to use tool_usage.extend() directly - Return NaN for stddev when sum of squares wasn't tracked - Add docstring to get_tool_usage_delta explaining NaN behavior - Add comprehensive tests for stddev variance calculation * fix tool usage delta stddev by including sum of squares in deltas Convert sum_of_squares_turns and sum_of_squares_calls from private attributes to public fields, enabling them to be included in delta calculations. This allows get_tool_usage_delta to return objects that compute accurate stddev values instead of NaN. * fix test to use get_tool_usage_snapshot for accurate stddev tracking The test was manually constructing a ToolUsageStats snapshot without sum_of_squares fields, causing stddev to be NaN. Now uses the proper snapshot method that includes all fields needed for delta calculations. * use nvidia-reasoning by default * mean -> average in log message * refactor log indentation to use centralized LOG_INDENT constant - Add LOG_INDENT constant to logging.py for consistent indentation - Replace hardcoded " \|-- " strings across all log statements - Add tool alias and MCP provider info to pre-generation logs - Improve model usage log format for better consistency - Update tests to match new log formats * simplify usage stats dict access in model registry Remove defensive .get() calls and unnecessary type casts since the usage statistics dictionary structure is now guaranteed. * walrus baby * simplify tool usage tracking and reduce log verbosity - Remove mean/stddev calculations from ToolUsageStats in favor of simple counts and generation ratios - Add total_generations field to track all tool-enabled generations - Simplify registry log output to show generations ratio (with_tools/total) - Remove per-column tool usage snapshot/delta logging from column builder - Track tool usage for all tool-enabled generations, not just those with calls * format inference parameters as multi-line log output - Add get_formatted_params() method to BaseInferenceParams - Add LOG_DOUBLE_INDENT constant for nested indentation - Update log_pre_generation() to display each parameter on its own line * update tests to use LOG_INDENT constants Align test assertions with the centralized log indentation constants introduced in the logging module refactor. * two-space consistency	2026-02-05 10:14:02 -05:00
Johnny Greco	4e89c2f9f3	standardize recipe script metadata (#292 )	2026-02-04 10:43:27 -05:00
Eric W. Tramel	532d21a8d7	feat: add extract_reasoning_content option to LLM columns (#285 )	2026-02-03 10:25:24 -05:00
Eric W. Tramel	510761107b	feat: Add TraceType enum for granular trace control (#284 )	2026-02-02 19:43:51 -05:00
Eric W. Tramel	7248b9fc8f	Update trace normalization to ChatML content blocks (#283 )	2026-02-02 18:22:16 -05:00
Eric W. Tramel	e6e58e692e	feat: MCP (Model Context Protocol) tool calling integration for LLM columns (#248 )	2026-02-02 09:41:58 -05:00
Johnny Greco	ae0665fa16	refactor: slim package refactor into three subpackages (#240 ) * remove old structure * major shuffle * streamline project configs * update make commands * updates to make commands * remove essentials * initialize logger in interface * uv lock * ignore notepad * update workflows * fix e2e project config * generate colab notebooks * resolve default model settings in interface * fix build commands * update perf import make command * cleaning up some slop * update recipes * move conftest files to tests/ * update subpackage readmes * streamline config_logging * use exports * update perf import usage pattern * update for IDE behavior with ruff * remove engine's fixtures file * add note to about lazy imports * update dependencies * update docs * doc fixes * uv lock * updates to catch up with main * clean up makefile * remove package gitignores * define deps only once * isolate tests * add test for protetion rule * create temp dirs for isolated tests * catch up to main * update headers * re apply changes * better result summaries for isolated tests * move exports into top-level init * fix client importlib version syntax * catch up with main	2026-01-27 13:53:20 -05:00
Johnny Greco	910d22dfa0	chore: add make commands to run examples as e2e tests (#199 ) * update makefile * fix bug	2026-01-12 15:37:00 -05:00
Johnny Greco	57b5f6f798	set up initial recipe section (#114 )	2025-12-10 14:51:07 -05:00
Johnny Greco	42b089e0f4	docs: establish doc templating, building, and strategy (#31 ) * initial updates with jupyter tutorials and styling * filling out some docs * add blank index * update docs workflow * clean up style sheet	2025-11-12 17:04:50 -05:00

27 commits