Commit graph

53 commits

Author SHA1 Message Date
Johnny Greco
3d9f5185d7
refactor: remove task metadata property (#216)
* remove metadata

* docs and tests

* don't need that test

* use static method for generation strategy

* update docs

* add docstring
2026-01-15 14:12:11 -05:00
Kirit Thadaka
ab660d01d1
docs: Added top models pie chart (#217)
* Added top models pie chart

* Updated image and added description
2026-01-14 11:54:05 -08:00
Johnny Greco
d962c86843
fix: update example runner command with notebooks dep group (#204)
* update dep groups; use in makefile

* add quotes to packages in pip command
2026-01-13 11:49:31 -05:00
Johnny Greco
910d22dfa0
chore: add make commands to run examples as e2e tests (#199)
* update makefile

* fix bug
2026-01-12 15:37:00 -05:00
Johnny Greco
69cd989285
refactor: update required resources treatment and use subclasses over mixins (#184)
* removing required resources

* fix tests

* add get required resources method to base column generator

* move classification functions to engine; remove required resources

* drop single from subclass names

* update model config logging

* fix unit test

* typo

* update type hint

* move tests
2026-01-09 14:42:09 -05:00
Mike Knepper
7b5ea13f8b
Fix stray validate calls in notebooks (#192) 2026-01-08 15:46:20 -06:00
Mike Knepper
8e69ab0336
refactor: Plugins rename task to impl (#189) 2026-01-08 13:34:05 -06:00
Mike Knepper
6bf7698bc2
refactor: Overhaul to seed datasets (#167) 2026-01-08 11:48:14 -06:00
Nabin Mulepati
01f8d887f8
chore: deprecate InferenceParameters (#183)
* deprecate InferenceParameters

* update docs and references
2026-01-08 10:43:02 -07:00
Mike Knepper
1c0bf65cc0
docs: Add extra_headers to model provider docs (#178) 2026-01-07 08:27:36 -06:00
Nabin Mulepati
645c7995b7
Fix documentation on max_tokens (#176) 2026-01-06 16:31:05 -07:00
Nabin Mulepati
3b4e296baf
feat: add OpenRouter as one of the default providers (#161)
* Add openrouter as a default provider

* Update docs
2026-01-06 10:22:18 -07:00
Mike Knepper
36a174af04
refactor: plugin system updates (#168) 2026-01-06 10:29:47 -06:00
Johnny Greco
b71c6c11a8
docs: fix links and tweak person sampling (#152)
* update person sampling

* update docstring
2025-12-18 10:10:41 -08:00
Johnny Greco
b635e41033
update docs (#151) 2025-12-18 12:43:29 -05:00
Johnny Greco
0a60f869c1
docs: just some tutorial notebook tweaks and a docstring update (#150)
* update doctstring

* notebook tweaks

* generate colab notebooks
2025-12-18 12:01:50 -05:00
Johnny Greco
6e6efc009f
docs: some updates for nano3 (#149)
* some fixes

* generate colab notebooks
2025-12-17 18:24:39 -05:00
Andre Manoel
d50a8aef95
docs: add processors (#147)
* first draft

* adding to code reference as well

* docstrings

* addressing comments

* forgot opening line

* docstring too
2025-12-17 15:47:33 -03:00
Nabin Mulepati
8d4c6c12b4
chore: Update nvidia text default model alias to nano v3 (#133) 2025-12-15 15:03:12 -07:00
Nabin Mulepati
3065179f8a
docs: add documentation on how to configure custom model settings (#124)
* Add generation type to ModelConfig

* pass tests

* added generate_text_embeddings

* tests

* remove sensitive=True old artifact no longer needed

* Slight refactor

* slight refactor

* Added embedding generator

* chunk_separator -> chunk_pattern

* update tests

* rename for consistency

* Restructure InferenceParameters -> CompletionInferenceParameters, BaseInferenceParameters, EmbeddingInferenceParameters

* Remove purpose from consolidated kwargs

* WithModelConfiguration.inference_parameters should should be typed with BaseInferenceParameters

* Type as WithModelGeneration

* Add image generation modality

* update return type for generate_kwargs

* make generation_type a field of ModelConfig as opposed to a prop resolved based on the type of InferenceParameters

* remove regex based chunking from embedding generator

* Remove image generation for now

* more tests and updates

* column_type_is_llm_generated -> column_type_is_model_generated

* change set to list: fix flaky tests

* CompletionInferenceParameters -> ChatCompletionInferenceParameters for consistency with generation_type

* Update docs

* fix deprecation warning originating from cli model settings

* update display of inference parameters in cli list

* save prog on inference parameter

* updates for the ocnfig builder

* update cli readme

* update cli for inference parmeters

* update inference parameter names

* flip order of vars

* WithCompletion -> WithChatCompletion

* specify InferenceParamsT

* Update columns.md with EmbeddingColumnConfig info

* make generation_type a descriminator field in inference params. add configuration support for max_parallel_requests and timeout

* DRY out some stuff in field.py

* docs for custom model settings

* Update nomenclature. prompt tokens -> input tokens, completion tokens -> output tokens in column statistics for consistency

* Add nvidia-embedding and openai-embedding to default model configs

* Fix typo in docs

* Make generate collab notebooks

* Address PR comments
2025-12-15 14:00:31 -07:00
Nabin Mulepati
8370e4a00b
feat: support native embedding generation (#106)
* Add generation type to ModelConfig

* pass tests

* added generate_text_embeddings

* tests

* remove sensitive=True old artifact no longer needed

* Slight refactor

* slight refactor

* Added embedding generator

* chunk_separator -> chunk_pattern

* update tests

* rename for consistency

* Restructure InferenceParameters -> CompletionInferenceParameters, BaseInferenceParameters, EmbeddingInferenceParameters

* Remove purpose from consolidated kwargs

* WithModelConfiguration.inference_parameters should should be typed with BaseInferenceParameters

* Type as WithModelGeneration

* Add image generation modality

* update return type for generate_kwargs

* make generation_type a field of ModelConfig as opposed to a prop resolved based on the type of InferenceParameters

* remove regex based chunking from embedding generator

* Remove image generation for now

* more tests and updates

* column_type_is_llm_generated -> column_type_is_model_generated

* change set to list: fix flaky tests

* CompletionInferenceParameters -> ChatCompletionInferenceParameters for consistency with generation_type

* Update docs

* fix deprecation warning originating from cli model settings

* update display of inference parameters in cli list

* save prog on inference parameter

* updates for the ocnfig builder

* update cli readme

* update cli for inference parmeters

* update inference parameter names

* flip order of vars

* WithCompletion -> WithChatCompletion

* specify InferenceParamsT

* Update columns.md with EmbeddingColumnConfig info

* make generation_type a descriminator field in inference params. add configuration support for max_parallel_requests and timeout

* DRY out some stuff in field.py

* Update nomenclature. prompt tokens -> input tokens, completion tokens -> output tokens in column statistics for consistency

* Add nvidia-embedding and openai-embedding to default model configs

* Fix typo in docs

* Make generate collab notebooks

* fine-tune -> adjust
2025-12-15 11:03:33 -07:00
Andre Manoel
68533c78be
docs: fix links on notebooks and add %%capture on install cell (#134) 2025-12-15 14:41:01 -03:00
Andre Manoel
ebc4024830
fix: typo on path to colab notebook (#129) 2025-12-12 15:37:36 -03:00
Andre Manoel
7fa9a413ac
docs: add option to open notebook directly in Colab (#126) 2025-12-12 15:15:26 -03:00
Kirit Thadaka
8d7a073e3a
docs: Updated Person Sampling docs (#120)
* Updated Person Sampling docs

* Updated mv command

* Removed versions

* Updated mv command

---------

Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2025-12-12 10:43:57 -05:00
Johnny Greco
48fdc8c838
docs: add initial plugin documentation (#107)
* add docstrings

* add analysis modules

* include toc for plugins section

* add plugin docs

* remove scope creep

* Update docs/plugins/example.md

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>

* address feedback

---------

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
2025-12-11 16:05:11 -05:00
Johnny Greco
e19bdad41c
fix link and some clean up (#119) 2025-12-10 21:20:22 -05:00
Johnny Greco
57b5f6f798
set up initial recipe section (#114) 2025-12-10 14:51:07 -05:00
Johnny Greco
b100cf2f1f
add footer navigation (#108) 2025-12-09 13:42:06 -05:00
Mike Knepper
32515ba724
style: Sort imports traditionally instead of within sections (#103) 2025-12-08 09:01:58 -06:00
Andre Manoel
275bbbf646
docs: add versioning using mike (#102)
* initial changes

* fix to override, adapting ci
2025-12-08 11:06:24 -03:00
Nabin Mulepati
1de2262b94
docs: add models module to code reference (#101)
* Add example notebook showing how to use image contexts

* change 101 -> tutorial

* update _README.md with info on the new tutorial

* add reference in mkdocs.yml

* simplify vlm tutorial

* update num_records on tutorials. Update .gitignore

* update readme info

* add models module to code reference

* fix links to generated ipynb

* change vlm in example tutorial to llama4-scout
2025-12-05 10:41:43 -07:00
Nabin Mulepati
8ccb724fb3
docs: Add example notebook showing how to use image contexts (#97) 2025-12-04 15:39:58 -07:00
Andre Manoel
fa86be1eae
fix: allow docs CI to be manually triggered, better download button (#99) 2025-12-04 14:48:16 -03:00
Andre Manoel
6d921c48ba
fix: small typo on text file (#95)
Notebooooks

Also changing from "Jupytext Format" to "`.py` Format"
2025-12-03 18:31:35 -03:00
Nabin Mulepati
8e3080241b
docs: move models docs to concepts > models (#93) 2025-12-03 14:10:01 -07:00
Andre Manoel
60a898181a
fix: add download links to notebooks (#94) 2025-12-03 18:01:57 -03:00
Andre Manoel
5d4ad10b11
chore: moving notebooks to jupytext and cleaning up workflows (#91)
* adding basic jupytext structure

Co-authored-by: Johnny Greco <jogreco@nvidia.com>

* few fixes

* first test for ci

* adding error intentionally to check workflow behavior

* test calling from other workflows

* typo

* trying as job instead

* couple of fixes

* checking path

* trying to fix path

* wrapping up

---------

Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2025-12-03 17:29:07 -03:00
Johnny Greco
1946410ada
use faker person sampling; links (#86) 2025-12-02 15:31:02 -05:00
Nabin Mulepati
0ed25b3add
chore: fix example notebook drop=False -> drop=True to match comment (#78) 2025-11-25 16:24:41 -07:00
Johnny Greco
060773c2ee
small doc fixes (#67) 2025-11-21 17:39:15 -05:00
Kirit Thadaka
4bee6d9088
docs: remove nemotron personas sampling from docs (for now) (#60)
* Update persona docs

* Updated person sampling docs based on feedback

* remove nemotron personas sampling

* Remove nemotron personas sampling

* Update docs/concepts/person_sampling.md

---------

Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2025-11-21 16:39:00 -05:00
Johnny Greco
585df726ab
docs: some link fixes (#65)
* use full links so they work in docs

* found more

* update link to model configs

* swap person sampling for plugins
2025-11-21 16:33:03 -05:00
Andre Manoel
ce0fc0805a
docs: streamlining tutorials (#61)
* first attempt

* typo

* it works! cleaning up

* adding trigger again just to run once

* cleanup

* typo
2025-11-21 16:14:48 -03:00
Johnny Greco
ec98211862
chore: some readme and docs cleanup (#56)
* update classifiers

* remove commented section for now

* update readme badges and links

* rename persons section to person sampling
2025-11-20 15:33:55 -05:00
Johnny Greco
14dc495341
docs: some documentation cleanup (#52)
* some documentation cleanup

* typo
2025-11-19 17:40:14 -05:00
Johnny Greco
362ec51544
docs: sampler params code ref and more (#50)
* add sampler params code ref

* add persons section

* add person from faker sampler
2025-11-19 16:27:40 -05:00
Andre Manoel
01fbf4d848
docs: validators etc. (#45)
* got a little help from Claude, will still double check everything

* fixing, adding docstrings

* forgotten file + overview to tutorial

* minor

* applying suggestions

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
Co-authored-by: Johnny Greco <jogreco@nvidia.com>

* addressing comments pt1

* addressing comments pt2

* trying something out

* fix

* typo

* trying again

* rollback workflow, add download links

* minor

* adapting notebooks to use fakersampler

---------

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2025-11-19 17:39:10 -03:00
Nabin Mulepati
cb0b1c6f6a
docs: docs for quickstart, cli, model settings (#37)
* vibe it baby

* clean up

* iterate with claude

* Save prog

* Update info pipeine

* Fix tests

* Fix typo

* remove redundant overload

* Add support for multiple default model providers and config

* pull user-defined model configs and providers if available

* Added tests for default model settings

* save progress

* refactor cli to be modular and use OOP

* new tests for cli components

* config_dir > config_path

* simplify list

* list tests

* stranded commit

* tests for commands

* tests for field.py

* tests for form.py

* more tests

* deleting providers should delete associated model configs

* add readme.md for cli

* clean up

* Fix tests

* feat: (FTUE) pull user-defined (via cli) model configs and providers  (#24)

* added docs for quick start and default model settings

* Updates per chat

* update quickstart.md

* update default-model-settings.md

* add check for interface.py as well

* move default model config resolution to src/data_designer/__init__.py

* Revert "move default model config resolution to src/data_designer/__init__.py"

This reverts commit 806a81dc93.

* docs for cli

* update default-model-settings.md

* docs for model provider

* more docs

* add new tests for get provider name

* add lru cache

* remove non doc related changes

* PR feedback

* update reset info

* tip for settings files

* update

* update info about default inference providers

* DATA_DESIGNER_HOME_DIR -> DATA_DESIGNER_HOME

---------

Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2025-11-18 21:28:03 -07:00
Andre Manoel
d0439fe833
docs: adding 101 notebooks (#38)
* started porting notebooks, need to wait for person sampler fix to finish

* a few more changes

* few fixes

* lint
2025-11-18 10:34:47 -03:00