Commit graph

7 commits

Author SHA1 Message Date
Johnny Greco
f8c201e085
chore: update header script to check for diffs (#195)
* update script

* update headers

* refactor a bit and add test script

* update headers

* update for edge case

* update headers

* add step to get file creation date

* use git history to get copyright year

* generation type is printed with inference parameters

* fix unit test
2026-01-09 17:10:58 -05:00
Nabin Mulepati
8370e4a00b
feat: support native embedding generation (#106)
* Add generation type to ModelConfig

* pass tests

* added generate_text_embeddings

* tests

* remove sensitive=True old artifact no longer needed

* Slight refactor

* slight refactor

* Added embedding generator

* chunk_separator -> chunk_pattern

* update tests

* rename for consistency

* Restructure InferenceParameters -> CompletionInferenceParameters, BaseInferenceParameters, EmbeddingInferenceParameters

* Remove purpose from consolidated kwargs

* WithModelConfiguration.inference_parameters should should be typed with BaseInferenceParameters

* Type as WithModelGeneration

* Add image generation modality

* update return type for generate_kwargs

* make generation_type a field of ModelConfig as opposed to a prop resolved based on the type of InferenceParameters

* remove regex based chunking from embedding generator

* Remove image generation for now

* more tests and updates

* column_type_is_llm_generated -> column_type_is_model_generated

* change set to list: fix flaky tests

* CompletionInferenceParameters -> ChatCompletionInferenceParameters for consistency with generation_type

* Update docs

* fix deprecation warning originating from cli model settings

* update display of inference parameters in cli list

* save prog on inference parameter

* updates for the ocnfig builder

* update cli readme

* update cli for inference parmeters

* update inference parameter names

* flip order of vars

* WithCompletion -> WithChatCompletion

* specify InferenceParamsT

* Update columns.md with EmbeddingColumnConfig info

* make generation_type a descriminator field in inference params. add configuration support for max_parallel_requests and timeout

* DRY out some stuff in field.py

* Update nomenclature. prompt tokens -> input tokens, completion tokens -> output tokens in column statistics for consistency

* Add nvidia-embedding and openai-embedding to default model configs

* Fix typo in docs

* Make generate collab notebooks

* fine-tune -> adjust
2025-12-15 11:03:33 -07:00
Johnny Greco
6e65b106cf
fix: analysis report when there is a column with mixed data types (#131)
* column config -> column name when possible

* fallback to dtype of first non-null element

* add unit tests

* add error message info to warning

* catch str_ too
2025-12-15 10:36:34 -05:00
Nabin Mulepati
a02f7e0a3e
don't lowercase score names when using it to dynamically create pydantic objects (#122) 2025-12-11 13:52:49 -07:00
Johnny Greco
55c21efece
seed dataset statistics limited to general stats (#32) 2025-11-13 11:34:26 -05:00
Johnny Greco
fdbc012989
feat: 🔌 Initial plugin system implementation (#23)
* separate column configs and types

* create plugin object

* create plugin manager

* fix config integration

* make base task registry raise on collision false by default

* update registry test after raise on collision default update

* make analysis work using general stats calculation

* default -> builtin

* use entry point approach instead

* rewire using plugin helpers

* add env var to disable plugins

* fix tests

* update plugin manager tests

* add tests for plugin helpers

* update license headers

* add emoji

* not using the pm in the builder code

* Update src/data_designer/plugins/manager.py

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>

* Update src/data_designer/plugins/manager.py

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>

* Update src/data_designer/plugins/manager.py

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>

* merge plugin registry into the manager

* small pr feedback

* client side plugin manager

* builtin -> default; move adding plugins to registry

* update method names to better match what they do

* use register verb for consistency with other registries

* thread safety updates; make discover private

---------

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
2025-11-11 15:36:52 -05:00
Johnny Greco
7ed5e78741 initial port 2025-10-27 14:29:12 -04:00