mirror of
https://github.com/NVIDIA-NeMo/DataDesigner
synced 2026-05-24 09:48:29 +00:00
* vibe it baby
* clean up
* iterate with claude
* Save prog
* Update info pipeine
* Fix tests
* Fix typo
* remove redundant overload
* Add support for multiple default model providers and config
* pull user-defined model configs and providers if available
* Added tests for default model settings
* save progress
* refactor cli to be modular and use OOP
* new tests for cli components
* config_dir > config_path
* simplify list
* list tests
* stranded commit
* tests for commands
* tests for field.py
* tests for form.py
* more tests
* deleting providers should delete associated model configs
* add readme.md for cli
* clean up
* Fix tests
* feat: (FTUE) pull user-defined (via cli) model configs and providers (#24)
* added docs for quick start and default model settings
* Updates per chat
* update quickstart.md
* update default-model-settings.md
* add check for interface.py as well
* move default model config resolution to src/data_designer/__init__.py
* Revert "move default model config resolution to src/data_designer/__init__.py"
This reverts commit 806a81dc93.
* docs for cli
* update default-model-settings.md
* docs for model provider
* more docs
* add new tests for get provider name
* add lru cache
* remove non doc related changes
* PR feedback
* update reset info
* tip for settings files
* update
* update info about default inference providers
* DATA_DESIGNER_HOME_DIR -> DATA_DESIGNER_HOME
---------
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2.6 KiB
2.6 KiB
Quick Start
Get started with Data Designer using the default model providers and configurations. Data Designer ships with built-in model providers and configurations that make it easy to start generating synthetic data immediately.
Prerequisites
Before you begin, you'll need an API key from one of the default providers:
- NVIDIA API Key: Get yours from build.nvidia.com
- OpenAI API Key (optional): Get yours from platform.openai.com
Set your API key as an environment variable:
export NVIDIA_API_KEY="your-api-key-here"
# Or for OpenAI
export OPENAI_API_KEY="your-openai-api-key-here"
Example
Below we'll construct a simple Data Designer workflow that generates multilingual greetings.
import os
from data_designer.essentials import (
CategorySamplerParams,
DataDesigner,
DataDesignerConfigBuilder,
InfoType,
LLMTextColumnConfig,
SamplerColumnConfig,
SamplerType,
)
# Set your API key from build.nvidia.com
# Skip this step if you've already exported your key to the environemnt variable
os.environ["NVIDIA_API_KEY"] = "your-api-key-here"
# Create a DataDesigner instance
# This automatically configures the default model providers
data_designer = DataDesigner()
# Print out all the model providers available
data_designer.info.display(InfoType.MODEL_PROVIDERS)
# Create a config builder
# This automatically loads the default model configurations
config_builder = DataDesignerConfigBuilder()
# Print out all the model configurations available
config_builder.info.display(InfoType.MODEL_CONFIGS)
# Add a sampler column to randomly select a language
config_builder.add_column(
SamplerColumnConfig(
name="language",
sampler_type=SamplerType.CATEGORY,
params=CategorySamplerParams(
values=["English", "Spanish", "French", "German", "Italian"],
),
)
)
# Add an LLM text generation column
# We'll use the built-in 'nvidia-text' model alias
config_builder.add_column(
LLMTextColumnConfig(
name="greetings",
model_alias="nvidia-text",
prompt="""Write a casual and formal greeting in '{{language}}' language.""",
)
)
# Run a preview to generate sample records
preview_results = data_designer.preview(config_builder=config_builder)
# Display a sample record
preview_results.display_sample_record()
🎉 Congratulations, you successfully ran one iteration designing your synthetic data. Follow along to learn more.
To learn more about the default providers and model configurations available, see the Default Model Settings guide.