mirror of
https://github.com/NVIDIA-NeMo/DataDesigner
synced 2026-05-24 09:48:29 +00:00
* remove old structure * major shuffle * streamline project configs * update make commands * updates to make commands * remove essentials * initialize logger in interface * uv lock * ignore notepad * update workflows * fix e2e project config * generate colab notebooks * resolve default model settings in interface * fix build commands * update perf import make command * cleaning up some slop * update recipes * move conftest files to tests/ * update subpackage readmes * streamline config_logging * use exports * update perf import usage pattern * update for IDE behavior with ruff * remove engine's fixtures file * add note to about lazy imports * update dependencies * update docs * doc fixes * uv lock * updates to catch up with main * clean up makefile * remove package gitignores * define deps only once * isolate tests * add test for protetion rule * create temp dirs for isolated tests * catch up to main * update headers * re apply changes * better result summaries for isolated tests * move exports into top-level init * fix client importlib version syntax * catch up with main
2.7 KiB
2.7 KiB
Quick Start
Get started with Data Designer using the default model providers and configurations. Data Designer ships with built-in model providers and configurations that make it easy to start generating synthetic data immediately.
Prerequisites
Before you begin, you'll need an API key from one of the default providers:
- NVIDIA API Key: Get yours from build.nvidia.com
- OpenAI API Key (optional): Get yours from platform.openai.com
- OpenRouter API Key (optional): Get yours from openrouter.ai
Set your API key as an environment variable:
export NVIDIA_API_KEY="your-api-key-here"
# Or for OpenAI
export OPENAI_API_KEY="your-openai-api-key-here"
# Or for OpenRouter
export OPENROUTER_API_KEY="your-openrouter-api-key-here"
Example
Below we'll construct a simple Data Designer workflow that generates multilingual greetings.
import os
import data_designer.config as dd
from data_designer.interface import DataDesigner
# Set your API key from build.nvidia.com
# Skip this step if you've already exported your key to the environment variable
os.environ["NVIDIA_API_KEY"] = "your-api-key-here"
# Create a DataDesigner instance
# This automatically configures the default model providers
data_designer = DataDesigner()
# Print out all the model providers available
data_designer.info.display(dd.InfoType.MODEL_PROVIDERS)
# Create a config builder
# This automatically loads the default model configurations
config_builder = dd.DataDesignerConfigBuilder()
# Print out all the model configurations available
config_builder.info.display(dd.InfoType.MODEL_CONFIGS)
# Add a sampler column to randomly select a language
config_builder.add_column(
dd.SamplerColumnConfig(
name="language",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=["English", "Spanish", "French", "German", "Italian"],
),
)
)
# Add an LLM text generation column
# We'll use the built-in 'nvidia-text' model alias
config_builder.add_column(
dd.LLMTextColumnConfig(
name="greetings",
model_alias="nvidia-text",
prompt="""Write a casual and formal greeting in '{{language}}' language.""",
)
)
# Run a preview to generate sample records
preview_results = data_designer.preview(config_builder=config_builder)
# Display a sample record
preview_results.display_sample_record()
🎉 Congratulations, you successfully ran one iteration designing your synthetic data. Follow along to learn more.
To learn more about the default providers and model configurations available, see the Default Model Settings guide.