DataDesigner

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

Author	SHA1	Message	Date
Johnny Greco	3d9f5185d7	refactor: remove task metadata property (#216 ) * remove metadata * docs and tests * don't need that test * use static method for generation strategy * update docs * add docstring	2026-01-15 14:12:11 -05:00
Kirit Thadaka	ab660d01d1	docs: Added top models pie chart (#217 ) * Added top models pie chart * Updated image and added description	2026-01-14 11:54:05 -08:00
Johnny Greco	d962c86843	fix: update example runner command with notebooks dep group (#204 ) * update dep groups; use in makefile * add quotes to packages in pip command	2026-01-13 11:49:31 -05:00
Johnny Greco	910d22dfa0	chore: add make commands to run examples as e2e tests (#199 ) * update makefile * fix bug	2026-01-12 15:37:00 -05:00
Johnny Greco	69cd989285	refactor: update required resources treatment and use subclasses over mixins (#184 ) * removing required resources * fix tests * add get required resources method to base column generator * move classification functions to engine; remove required resources * drop single from subclass names * update model config logging * fix unit test * typo * update type hint * move tests	2026-01-09 14:42:09 -05:00
Mike Knepper	7b5ea13f8b	Fix stray validate calls in notebooks (#192 )	2026-01-08 15:46:20 -06:00
Mike Knepper	8e69ab0336	refactor: Plugins rename task to impl (#189 )	2026-01-08 13:34:05 -06:00
Mike Knepper	6bf7698bc2	refactor: Overhaul to seed datasets (#167 )	2026-01-08 11:48:14 -06:00
Nabin Mulepati	01f8d887f8	chore: deprecate InferenceParameters (#183 ) * deprecate InferenceParameters * update docs and references	2026-01-08 10:43:02 -07:00
Mike Knepper	1c0bf65cc0	docs: Add extra_headers to model provider docs (#178 )	2026-01-07 08:27:36 -06:00
Nabin Mulepati	645c7995b7	Fix documentation on max_tokens (#176 )	2026-01-06 16:31:05 -07:00
Nabin Mulepati	3b4e296baf	feat: add OpenRouter as one of the default providers (#161 ) * Add openrouter as a default provider * Update docs	2026-01-06 10:22:18 -07:00
Mike Knepper	36a174af04	refactor: plugin system updates (#168 )	2026-01-06 10:29:47 -06:00
Johnny Greco	b71c6c11a8	docs: fix links and tweak person sampling (#152 ) * update person sampling * update docstring	2025-12-18 10:10:41 -08:00
Johnny Greco	b635e41033	update docs (#151 )	2025-12-18 12:43:29 -05:00
Johnny Greco	0a60f869c1	docs: just some tutorial notebook tweaks and a docstring update (#150 ) * update doctstring * notebook tweaks * generate colab notebooks	2025-12-18 12:01:50 -05:00
Johnny Greco	6e6efc009f	docs: some updates for nano3 (#149 ) * some fixes * generate colab notebooks	2025-12-17 18:24:39 -05:00
Andre Manoel	d50a8aef95	docs: add processors (#147 ) * first draft * adding to code reference as well * docstrings * addressing comments * forgot opening line * docstring too	2025-12-17 15:47:33 -03:00
Nabin Mulepati	8d4c6c12b4	chore: Update nvidia text default model alias to nano v3 (#133 )	2025-12-15 15:03:12 -07:00
Nabin Mulepati	3065179f8a	docs: add documentation on how to configure custom model settings (#124 ) * Add generation type to ModelConfig * pass tests * added generate_text_embeddings * tests * remove sensitive=True old artifact no longer needed * Slight refactor * slight refactor * Added embedding generator * chunk_separator -> chunk_pattern * update tests * rename for consistency * Restructure InferenceParameters -> CompletionInferenceParameters, BaseInferenceParameters, EmbeddingInferenceParameters * Remove purpose from consolidated kwargs * WithModelConfiguration.inference_parameters should should be typed with BaseInferenceParameters * Type as WithModelGeneration * Add image generation modality * update return type for generate_kwargs * make generation_type a field of ModelConfig as opposed to a prop resolved based on the type of InferenceParameters * remove regex based chunking from embedding generator * Remove image generation for now * more tests and updates * column_type_is_llm_generated -> column_type_is_model_generated * change set to list: fix flaky tests * CompletionInferenceParameters -> ChatCompletionInferenceParameters for consistency with generation_type * Update docs * fix deprecation warning originating from cli model settings * update display of inference parameters in cli list * save prog on inference parameter * updates for the ocnfig builder * update cli readme * update cli for inference parmeters * update inference parameter names * flip order of vars * WithCompletion -> WithChatCompletion * specify InferenceParamsT * Update columns.md with EmbeddingColumnConfig info * make generation_type a descriminator field in inference params. add configuration support for max_parallel_requests and timeout * DRY out some stuff in field.py * docs for custom model settings * Update nomenclature. prompt tokens -> input tokens, completion tokens -> output tokens in column statistics for consistency * Add nvidia-embedding and openai-embedding to default model configs * Fix typo in docs * Make generate collab notebooks * Address PR comments	2025-12-15 14:00:31 -07:00
Nabin Mulepati	8370e4a00b	feat: support native embedding generation (#106 ) * Add generation type to ModelConfig * pass tests * added generate_text_embeddings * tests * remove sensitive=True old artifact no longer needed * Slight refactor * slight refactor * Added embedding generator * chunk_separator -> chunk_pattern * update tests * rename for consistency * Restructure InferenceParameters -> CompletionInferenceParameters, BaseInferenceParameters, EmbeddingInferenceParameters * Remove purpose from consolidated kwargs * WithModelConfiguration.inference_parameters should should be typed with BaseInferenceParameters * Type as WithModelGeneration * Add image generation modality * update return type for generate_kwargs * make generation_type a field of ModelConfig as opposed to a prop resolved based on the type of InferenceParameters * remove regex based chunking from embedding generator * Remove image generation for now * more tests and updates * column_type_is_llm_generated -> column_type_is_model_generated * change set to list: fix flaky tests * CompletionInferenceParameters -> ChatCompletionInferenceParameters for consistency with generation_type * Update docs * fix deprecation warning originating from cli model settings * update display of inference parameters in cli list * save prog on inference parameter * updates for the ocnfig builder * update cli readme * update cli for inference parmeters * update inference parameter names * flip order of vars * WithCompletion -> WithChatCompletion * specify InferenceParamsT * Update columns.md with EmbeddingColumnConfig info * make generation_type a descriminator field in inference params. add configuration support for max_parallel_requests and timeout * DRY out some stuff in field.py * Update nomenclature. prompt tokens -> input tokens, completion tokens -> output tokens in column statistics for consistency * Add nvidia-embedding and openai-embedding to default model configs * Fix typo in docs * Make generate collab notebooks * fine-tune -> adjust	2025-12-15 11:03:33 -07:00
Andre Manoel	68533c78be	docs: fix links on notebooks and add %%capture on install cell (#134 )	2025-12-15 14:41:01 -03:00
Andre Manoel	ebc4024830	fix: typo on path to colab notebook (#129 )	2025-12-12 15:37:36 -03:00
Andre Manoel	7fa9a413ac	docs: add option to open notebook directly in Colab (#126 )	2025-12-12 15:15:26 -03:00
Kirit Thadaka	8d7a073e3a	docs: Updated Person Sampling docs (#120 ) * Updated Person Sampling docs * Updated mv command * Removed versions * Updated mv command --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com>	2025-12-12 10:43:57 -05:00
Johnny Greco	48fdc8c838	docs: add initial plugin documentation (#107 ) * add docstrings * add analysis modules * include toc for plugins section * add plugin docs * remove scope creep * Update docs/plugins/example.md Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com> * address feedback --------- Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>	2025-12-11 16:05:11 -05:00
Johnny Greco	e19bdad41c	fix link and some clean up (#119 )	2025-12-10 21:20:22 -05:00
Johnny Greco	57b5f6f798	set up initial recipe section (#114 )	2025-12-10 14:51:07 -05:00
Johnny Greco	b100cf2f1f	add footer navigation (#108 )	2025-12-09 13:42:06 -05:00
Mike Knepper	32515ba724	style: Sort imports traditionally instead of within sections (#103 )	2025-12-08 09:01:58 -06:00
Andre Manoel	275bbbf646	docs: add versioning using `mike` (#102 ) * initial changes * fix to override, adapting ci	2025-12-08 11:06:24 -03:00
Nabin Mulepati	1de2262b94	docs: add models module to code reference (#101 ) * Add example notebook showing how to use image contexts * change 101 -> tutorial * update _README.md with info on the new tutorial * add reference in mkdocs.yml * simplify vlm tutorial * update num_records on tutorials. Update .gitignore * update readme info * add models module to code reference * fix links to generated ipynb * change vlm in example tutorial to llama4-scout	2025-12-05 10:41:43 -07:00
Nabin Mulepati	8ccb724fb3	docs: Add example notebook showing how to use image contexts (#97 )	2025-12-04 15:39:58 -07:00
Andre Manoel	fa86be1eae	fix: allow docs CI to be manually triggered, better download button (#99 )	2025-12-04 14:48:16 -03:00
Andre Manoel	6d921c48ba	fix: small typo on text file (#95 ) Notebooooks Also changing from "Jupytext Format" to "`.py` Format"	2025-12-03 18:31:35 -03:00
Nabin Mulepati	8e3080241b	docs: move models docs to concepts > models (#93 )	2025-12-03 14:10:01 -07:00
Andre Manoel	60a898181a	fix: add download links to notebooks (#94 )	2025-12-03 18:01:57 -03:00
Andre Manoel	5d4ad10b11	chore: moving notebooks to jupytext and cleaning up workflows (#91 ) * adding basic jupytext structure Co-authored-by: Johnny Greco <jogreco@nvidia.com> * few fixes * first test for ci * adding error intentionally to check workflow behavior * test calling from other workflows * typo * trying as job instead * couple of fixes * checking path * trying to fix path * wrapping up --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com>	2025-12-03 17:29:07 -03:00
Johnny Greco	1946410ada	use faker person sampling; links (#86 )	2025-12-02 15:31:02 -05:00
Nabin Mulepati	0ed25b3add	chore: fix example notebook drop=False -> drop=True to match comment (#78 )	2025-11-25 16:24:41 -07:00
Johnny Greco	060773c2ee	small doc fixes (#67 )	2025-11-21 17:39:15 -05:00
Kirit Thadaka	4bee6d9088	docs: remove nemotron personas sampling from docs (for now) (#60 ) * Update persona docs * Updated person sampling docs based on feedback * remove nemotron personas sampling * Remove nemotron personas sampling * Update docs/concepts/person_sampling.md --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com>	2025-11-21 16:39:00 -05:00
Johnny Greco	585df726ab	docs: some link fixes (#65 ) * use full links so they work in docs * found more * update link to model configs * swap person sampling for plugins	2025-11-21 16:33:03 -05:00
Andre Manoel	ce0fc0805a	docs: streamlining tutorials (#61 ) * first attempt * typo * it works! cleaning up * adding trigger again just to run once * cleanup * typo	2025-11-21 16:14:48 -03:00
Johnny Greco	ec98211862	chore: some readme and docs cleanup (#56 ) * update classifiers * remove commented section for now * update readme badges and links * rename persons section to person sampling	2025-11-20 15:33:55 -05:00
Johnny Greco	14dc495341	docs: some documentation cleanup (#52 ) * some documentation cleanup * typo	2025-11-19 17:40:14 -05:00
Johnny Greco	362ec51544	docs: sampler params code ref and more (#50 ) * add sampler params code ref * add persons section * add person from faker sampler	2025-11-19 16:27:40 -05:00
Andre Manoel	01fbf4d848	docs: validators etc. (#45 ) * got a little help from Claude, will still double check everything * fixing, adding docstrings * forgotten file + overview to tutorial * minor * applying suggestions Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com> Co-authored-by: Johnny Greco <jogreco@nvidia.com> * addressing comments pt1 * addressing comments pt2 * trying something out * fix * typo * trying again * rollback workflow, add download links * minor * adapting notebooks to use fakersampler --------- Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com> Co-authored-by: Johnny Greco <jogreco@nvidia.com>	2025-11-19 17:39:10 -03:00
Nabin Mulepati	cb0b1c6f6a	docs: docs for quickstart, cli, model settings (#37 ) * vibe it baby * clean up * iterate with claude * Save prog * Update info pipeine * Fix tests * Fix typo * remove redundant overload * Add support for multiple default model providers and config * pull user-defined model configs and providers if available * Added tests for default model settings * save progress * refactor cli to be modular and use OOP * new tests for cli components * config_dir > config_path * simplify list * list tests * stranded commit * tests for commands * tests for field.py * tests for form.py * more tests * deleting providers should delete associated model configs * add readme.md for cli * clean up * Fix tests * feat: (FTUE) pull user-defined (via cli) model configs and providers (#24) * added docs for quick start and default model settings * Updates per chat * update quickstart.md * update default-model-settings.md * add check for interface.py as well * move default model config resolution to src/data_designer/__init__.py * Revert "move default model config resolution to src/data_designer/__init__.py" This reverts commit `806a81dc93`. * docs for cli * update default-model-settings.md * docs for model provider * more docs * add new tests for get provider name * add lru cache * remove non doc related changes * PR feedback * update reset info * tip for settings files * update * update info about default inference providers * DATA_DESIGNER_HOME_DIR -> DATA_DESIGNER_HOME --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com>	2025-11-18 21:28:03 -07:00
Andre Manoel	d0439fe833	docs: adding 101 notebooks (#38 ) * started porting notebooks, need to wait for person sampler fix to finish * a few more changes * few fixes * lint	2025-11-18 10:34:47 -03:00

1 2

53 commits