# Overview Welcome to the Data Designer tutorial series! These hands-on notebooks will guide you through the core concepts and features of Data Designer, from basic synthetic data generation to advanced techniques like structured outputs and dataset seeding. ## 🚀 Setting Up Your Environment ### Local Setup Best Practices First, download the tutorial [from the release assets](https://github.com/NVIDIA-NeMo/DataDesigner/releases/latest/download/data_designer_tutorial.zip). To run the tutorial notebooks locally, we recommend using a virtual environment to manage dependencies: === "uv (Recommended)" ```bash # Extract tutorial notebooks unzip data_designer_tutorial.zip cd data_designer_tutorial # Launch Jupyter uv run jupyter notebook ``` === "pip + venv" ```bash # Extract tutorial notebooks unzip data_designer_tutorial.zip cd data_designer_tutorial # Create Python virtual environment and install required packages python -m venv venv source venv/bin/activate pip install data-designer jupyter # Launch Jupyter jupyter notebook ``` ### API Keys and Authentication Data Designer is able to interface with various LLM providers. You'll need to set up API keys for the models you want to use: ```bash # For NVIDIA API Catalog (build.nvidia.com) export NVIDIA_API_KEY="your-api-key-here" # For OpenAI export OPENAI_API_KEY="your-api-key-here" # For OpenRouter export OPENROUTER_API_KEY="your-api-key-here" ``` For more information, check the [Welcome](../index.md), [Default Model Settings](../concepts/models/default-model-settings.md) and how to [Configure Model Settings Using The CLI](../concepts/models/configure-model-settings-with-the-cli.md). ## 📚 Tutorial Series The tutorials are designed to be completed in sequence, building upon concepts introduced in previous notebooks: ### [1. The Basics](1-the-basics.ipynb) Learn the fundamentals of Data Designer by generating a simple product review dataset. This notebook covers: - Setting up the `DataDesigner` interface - Configuring models and inference parameters - Using built-in samplers (Category, Person, Uniform) - Generating LLM text columns with dependencies - Understanding the generation workflow **Start here if you're new to Data Designer!** ### [2. Structured Outputs and Jinja Expressions](2-structured-outputs-and-jinja-expressions.ipynb) Explore more advanced data generation capabilities: - Creating structured JSON outputs with schemas - Using Jinja expressions for derived columns - Combining samplers with structured data - Building complex data dependencies - Working with nested data structures ### [3. Seeding with an External Dataset](3-seeding-with-a-dataset.ipynb) Learn how to leverage existing datasets to guide synthetic data generation: - Loading and using seed datasets - Sampling from real data distributions - Combining seed data with LLM generation - Creating realistic synthetic data based on existing patterns ### [4. Providing Images as Context](4-providing-images-as-context.ipynb) Learn how to use vision-language models to generate text descriptions from images: - Processing and converting images to base64 format for model consumption - Using vision-language models (VLMs) to analyze visual documents - Generating detailed summaries from document images - Inspecting and validating vision-based generation results ### [5. Generating Images](5-generating-images.ipynb) Generate synthetic image data with Data Designer: - Configuring image-generation models with `ImageInferenceParams` - Adding image columns with Jinja2 prompts and sampler-driven diversity - Preview (base64 in dataframe) vs create (images saved to disk, paths in dataframe) - Displaying generated images in the notebook ### [6. Image-to-Image Editing](6-editing-images-with-image-context.ipynb) Chain image generation columns to generate and then edit images: - Generating images from text and then editing them in a follow-up column - Using `ImageContext` with auto-detection to pass generated images to an editing model - Combining sampled accessories and settings for varied edits - Comparing generated vs edited images in preview and create modes ## 📖 Important Documentation Sections Before diving into the tutorials, familiarize yourself with these key documentation sections: ### Getting Started - **[Welcome & Installation](../index.md)** - Overview of Data Designer capabilities and installation instructions ### Core Concepts Understanding these concepts will help you make the most of the tutorials: - **[Columns](../concepts/columns.md)** - Learn about different column types (Sampler, LLM, Expression, Validation, etc.) - **[Validators](../concepts/validators.md)** - Understand how to validate generated data with Python, SQL, and remote validators - **[Person Sampling](../concepts/person_sampling.md)** - Learn how to sample realistic person data with demographic attributes ### Code Reference Quick reference guides for the main configuration objects: - **[column_configs](../code_reference/column_configs.md)** - All column configuration types - **[config_builder](../code_reference/config_builder.md)** - The `DataDesignerConfigBuilder` API - **[data_designer_config](../code_reference/data_designer_config.md)** - Main configuration schema - **[validator_params](../code_reference/validator_params.md)** - Validator configuration options