DataDesigner

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

Author	SHA1	Message	Date
Andre Manoel	46358461ee	fix: repair notebook CI (dead model, missing API key, pyarrow type bug) (#348 ) * fix: repair notebook CI by replacing dead vision model and adding missing API key - Replace `meta/llama-4-scout-17b-16e-instruct` (no longer serving on build.nvidia.com) with `nvidia/nemotron-nano-12b-v2-vl` (project default) in tutorial notebook 4 - Add `OPENROUTER_API_KEY` to the `build-notebooks` workflow so notebooks 5 and 6 (which use OpenRouter for image generation) can authenticate - Regenerate colab notebooks to reflect the model change * fix: handle pyarrow list types in notebook 6 display_image When image columns are loaded from parquet with pyarrow backend, list values are pyarrow ListScalars, not Python lists. The isinstance(x, list) check fails, causing the whole ListScalar to be treated as a single path string (producing filenames ending in `png')]`). Use isinstance(x, str) instead to correctly handle any iterable type.	2026-02-23 13:27:47 -03:00
Nabin Mulepati	8f7a72094a	feat: auto-detect ImageContext format for image-to-image generation (#342 ) * updates to support image->image * update notebooks * regen colab notebooks * simplify tests	2026-02-20 15:54:42 -05:00
Nabin Mulepati	d8d1e668b0	docs: add image generation documentation and image-to-image editing tutorial (#319 )	2026-02-12 14:38:52 -07:00
Nabin Mulepati	8e2fd3286f	feat: add image generation support with multi-modal context (#317 )	2026-02-12 14:00:28 -07:00
Andre Manoel	b6d400ef7d	chore: update tutorial notebooks to use dd. notation consistently (#288 ) - Convert notebook 3 from string-based columns to class specs (dd.SamplerColumnConfig, etc.) - Fix grammar: "is the main object is responsible" → "is the main object responsible" - Remove stray "A" at end of URL in notebook 2 - Remove empty markdown cell in notebook 4 - Add missing data_designer.validate() call in notebook 4 - Regenerate colab notebooks from source	2026-02-03 12:03:32 -03:00
Kirit Thadaka	de7c3ab99a	docs: add deployment, performance tuning guides and streamline gettin… (#277 ) * docs: add deployment, performance tuning guides and streamline getting started - Add deployment-options.md: Library vs. Microservice decision guide - Add inference-architecture.md: Separation of concerns with LLM servers - Add performance-tuning.md: Concurrency and batching optimization guide - Streamline index.md: Merge installation, add quick example, simplify - Remove quick-start.md: Content merged into welcome page - Remove installation.md: Content merged into welcome page - Update model docs: Add concurrency control sections and cross-references - Update mkdocs.yml: Add new Architecture section to navigation * docs: add tasteful emojis to new documentation pages * docs: consolidate redundant concurrency and troubleshooting content - Remove duplicate max_parallel_requests tables from model-configs.md and inference-parameters.md - Remove duplicate Concurrency Control section from model-configs.md - Simplify Concurrency Control in inference-parameters.md to link to performance-tuning.md - Remove Troubleshooting section from inference-architecture.md (covered in performance-tuning.md) - performance-tuning.md is now the authoritative source for tuning guidance * Simplified doc additions * Switched default model to nemotron 3 nano * Addressed feedback * Added first blog draft	2026-02-02 21:03:58 -08:00
Johnny Greco	ae0665fa16	refactor: slim package refactor into three subpackages (#240 ) * remove old structure * major shuffle * streamline project configs * update make commands * updates to make commands * remove essentials * initialize logger in interface * uv lock * ignore notepad * update workflows * fix e2e project config * generate colab notebooks * resolve default model settings in interface * fix build commands * update perf import make command * cleaning up some slop * update recipes * move conftest files to tests/ * update subpackage readmes * streamline config_logging * use exports * update perf import usage pattern * update for IDE behavior with ruff * remove engine's fixtures file * add note to about lazy imports * update dependencies * update docs * doc fixes * uv lock * updates to catch up with main * clean up makefile * remove package gitignores * define deps only once * isolate tests * add test for protetion rule * create temp dirs for isolated tests * catch up to main * update headers * re apply changes * better result summaries for isolated tests * move exports into top-level init * fix client importlib version syntax * catch up with main	2026-01-27 13:53:20 -05:00
Mike Knepper	7b5ea13f8b	Fix stray validate calls in notebooks (#192 )	2026-01-08 15:46:20 -06:00
Mike Knepper	6bf7698bc2	refactor: Overhaul to seed datasets (#167 )	2026-01-08 11:48:14 -06:00
Nabin Mulepati	3b4e296baf	feat: add OpenRouter as one of the default providers (#161 ) * Add openrouter as a default provider * Update docs	2026-01-06 10:22:18 -07:00
Johnny Greco	0a60f869c1	docs: just some tutorial notebook tweaks and a docstring update (#150 ) * update doctstring * notebook tweaks * generate colab notebooks	2025-12-18 12:01:50 -05:00
Johnny Greco	6e6efc009f	docs: some updates for nano3 (#149 ) * some fixes * generate colab notebooks	2025-12-17 18:24:39 -05:00
Nabin Mulepati	8d4c6c12b4	chore: Update nvidia text default model alias to nano v3 (#133 )	2025-12-15 15:03:12 -07:00
Nabin Mulepati	8370e4a00b	feat: support native embedding generation (#106 ) * Add generation type to ModelConfig * pass tests * added generate_text_embeddings * tests * remove sensitive=True old artifact no longer needed * Slight refactor * slight refactor * Added embedding generator * chunk_separator -> chunk_pattern * update tests * rename for consistency * Restructure InferenceParameters -> CompletionInferenceParameters, BaseInferenceParameters, EmbeddingInferenceParameters * Remove purpose from consolidated kwargs * WithModelConfiguration.inference_parameters should should be typed with BaseInferenceParameters * Type as WithModelGeneration * Add image generation modality * update return type for generate_kwargs * make generation_type a field of ModelConfig as opposed to a prop resolved based on the type of InferenceParameters * remove regex based chunking from embedding generator * Remove image generation for now * more tests and updates * column_type_is_llm_generated -> column_type_is_model_generated * change set to list: fix flaky tests * CompletionInferenceParameters -> ChatCompletionInferenceParameters for consistency with generation_type * Update docs * fix deprecation warning originating from cli model settings * update display of inference parameters in cli list * save prog on inference parameter * updates for the ocnfig builder * update cli readme * update cli for inference parmeters * update inference parameter names * flip order of vars * WithCompletion -> WithChatCompletion * specify InferenceParamsT * Update columns.md with EmbeddingColumnConfig info * make generation_type a descriminator field in inference params. add configuration support for max_parallel_requests and timeout * DRY out some stuff in field.py * Update nomenclature. prompt tokens -> input tokens, completion tokens -> output tokens in column statistics for consistency * Add nvidia-embedding and openai-embedding to default model configs * Fix typo in docs * Make generate collab notebooks * fine-tune -> adjust	2025-12-15 11:03:33 -07:00
Andre Manoel	68533c78be	docs: fix links on notebooks and add %%capture on install cell (#134 )	2025-12-15 14:41:01 -03:00
Andre Manoel	7fa9a413ac	docs: add option to open notebook directly in Colab (#126 )	2025-12-12 15:15:26 -03:00
Mike Knepper	32515ba724	style: Sort imports traditionally instead of within sections (#103 )	2025-12-08 09:01:58 -06:00
Nabin Mulepati	1de2262b94	docs: add models module to code reference (#101 ) * Add example notebook showing how to use image contexts * change 101 -> tutorial * update _README.md with info on the new tutorial * add reference in mkdocs.yml * simplify vlm tutorial * update num_records on tutorials. Update .gitignore * update readme info * add models module to code reference * fix links to generated ipynb * change vlm in example tutorial to llama4-scout	2025-12-05 10:41:43 -07:00
Nabin Mulepati	8ccb724fb3	docs: Add example notebook showing how to use image contexts (#97 )	2025-12-04 15:39:58 -07:00
Andre Manoel	6d921c48ba	fix: small typo on text file (#95 ) Notebooooks Also changing from "Jupytext Format" to "`.py` Format"	2025-12-03 18:31:35 -03:00
Nabin Mulepati	8e3080241b	docs: move models docs to concepts > models (#93 )	2025-12-03 14:10:01 -07:00
Andre Manoel	60a898181a	fix: add download links to notebooks (#94 )	2025-12-03 18:01:57 -03:00
Andre Manoel	5d4ad10b11	chore: moving notebooks to jupytext and cleaning up workflows (#91 ) * adding basic jupytext structure Co-authored-by: Johnny Greco <jogreco@nvidia.com> * few fixes * first test for ci * adding error intentionally to check workflow behavior * test calling from other workflows * typo * trying as job instead * couple of fixes * checking path * trying to fix path * wrapping up --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com>	2025-12-03 17:29:07 -03:00

23 commits