DataDesigner

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

Author	SHA1	Message	Date
Johnny Greco	f8c201e085	chore: update header script to check for diffs (#195 ) * update script * update headers * refactor a bit and add test script * update headers * update for edge case * update headers * add step to get file creation date * use git history to get copyright year * generation type is printed with inference parameters * fix unit test	2026-01-09 17:10:58 -05:00
Nabin Mulepati	8370e4a00b	feat: support native embedding generation (#106 ) * Add generation type to ModelConfig * pass tests * added generate_text_embeddings * tests * remove sensitive=True old artifact no longer needed * Slight refactor * slight refactor * Added embedding generator * chunk_separator -> chunk_pattern * update tests * rename for consistency * Restructure InferenceParameters -> CompletionInferenceParameters, BaseInferenceParameters, EmbeddingInferenceParameters * Remove purpose from consolidated kwargs * WithModelConfiguration.inference_parameters should should be typed with BaseInferenceParameters * Type as WithModelGeneration * Add image generation modality * update return type for generate_kwargs * make generation_type a field of ModelConfig as opposed to a prop resolved based on the type of InferenceParameters * remove regex based chunking from embedding generator * Remove image generation for now * more tests and updates * column_type_is_llm_generated -> column_type_is_model_generated * change set to list: fix flaky tests * CompletionInferenceParameters -> ChatCompletionInferenceParameters for consistency with generation_type * Update docs * fix deprecation warning originating from cli model settings * update display of inference parameters in cli list * save prog on inference parameter * updates for the ocnfig builder * update cli readme * update cli for inference parmeters * update inference parameter names * flip order of vars * WithCompletion -> WithChatCompletion * specify InferenceParamsT * Update columns.md with EmbeddingColumnConfig info * make generation_type a descriminator field in inference params. add configuration support for max_parallel_requests and timeout * DRY out some stuff in field.py * Update nomenclature. prompt tokens -> input tokens, completion tokens -> output tokens in column statistics for consistency * Add nvidia-embedding and openai-embedding to default model configs * Fix typo in docs * Make generate collab notebooks * fine-tune -> adjust	2025-12-15 11:03:33 -07:00
Johnny Greco	6e65b106cf	fix: analysis report when there is a column with mixed data types (#131 ) * column config -> column name when possible * fallback to dtype of first non-null element * add unit tests * add error message info to warning * catch str_ too	2025-12-15 10:36:34 -05:00
Nabin Mulepati	a02f7e0a3e	don't lowercase score names when using it to dynamically create pydantic objects (#122 )	2025-12-11 13:52:49 -07:00
Johnny Greco	55c21efece	seed dataset statistics limited to general stats (#32 )	2025-11-13 11:34:26 -05:00
Johnny Greco	fdbc012989	feat: 🔌 Initial plugin system implementation (#23 ) * separate column configs and types * create plugin object * create plugin manager * fix config integration * make base task registry raise on collision false by default * update registry test after raise on collision default update * make analysis work using general stats calculation * default -> builtin * use entry point approach instead * rewire using plugin helpers * add env var to disable plugins * fix tests * update plugin manager tests * add tests for plugin helpers * update license headers * add emoji * not using the pm in the builder code * Update src/data_designer/plugins/manager.py Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com> * Update src/data_designer/plugins/manager.py Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com> * Update src/data_designer/plugins/manager.py Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com> * merge plugin registry into the manager * small pr feedback * client side plugin manager * builtin -> default; move adding plugins to registry * update method names to better match what they do * use register verb for consistency with other registries * thread safety updates; make discover private --------- Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>	2025-11-11 15:36:52 -05:00
Johnny Greco	7ed5e78741	initial port	2025-10-27 14:29:12 -04:00

7 commits