DataDesigner/docs/notebook_source/_README.md

# Overview

Welcome to the Data Designer tutorial series! These hands-on notebooks will guide you through the core concepts and features of Data Designer, from basic synthetic data generation to advanced techniques like structured outputs and dataset seeding.

## 🚀 Setting Up Your Environment

### Local Setup Best Practices

First, download the tutorial [from the release assets](https://github.com/NVIDIA-NeMo/DataDesigner/releases/latest/download/data_designer_tutorial.zip).
To run the tutorial notebooks locally, we recommend using a virtual environment to manage dependencies:

=== "uv (Recommended)"

    ```bash
    # Extract tutorial notebooks
    unzip data_designer_tutorial.zip
    cd data_designer_tutorial

    # Launch Jupyter
    uv run jupyter notebook
    ```

=== "pip + venv"

    ```bash
    # Extract tutorial notebooks
    unzip data_designer_tutorial.zip
    cd data_designer_tutorial

    # Create Python virtual environment and install required packages
    python -m venv venv
    source venv/bin/activate
    pip install data-designer jupyter

    # Launch Jupyter
    jupyter notebook
    ```

### API Keys and Authentication

Data Designer is able to interface with various LLM providers. You'll need to set up API keys for the models you want to use:

```bash
# For NVIDIA API Catalog (build.nvidia.com)
export NVIDIA_API_KEY="your-api-key-here"

# For OpenAI
export OPENAI_API_KEY="your-api-key-here"

# For OpenRouter
export OPENROUTER_API_KEY="your-api-key-here"
```

For more information, check the [Welcome](../index.md), [Default Model Settings](../concepts/models/default-model-settings.md) and how to [Configure Model Settings Using The CLI](../concepts/models/configure-model-settings-with-the-cli.md).

## 📚 Tutorial Series

The tutorials are designed to be completed in sequence, building upon concepts introduced in previous notebooks:

### [1. The Basics](1-the-basics.ipynb)

Learn the fundamentals of Data Designer by generating a simple product review dataset. This notebook covers:

- Setting up the `DataDesigner` interface
- Configuring models and inference parameters
- Using built-in samplers (Category, Person, Uniform)
- Generating LLM text columns with dependencies
- Understanding the generation workflow

**Start here if you're new to Data Designer!**

### [2. Structured Outputs, Jinja Expressions, and Conditional Generation](2-structured-outputs-and-jinja-expressions.ipynb)

Explore more advanced data generation capabilities:

- Creating structured JSON outputs with schemas
- Using Jinja expressions for derived columns
- Combining samplers with structured data
- Building complex data dependencies
- Working with nested data structures
- Conditional generation with `skip.when`

### [3. Seeding with an External Dataset](3-seeding-with-a-dataset.ipynb)

Learn how to leverage existing datasets to guide synthetic data generation:

- Loading and using seed datasets
- Sampling from real data distributions
- Combining seed data with LLM generation
- Creating realistic synthetic data based on existing patterns

### [4. Providing Images as Context](4-providing-images-as-context.ipynb)

Learn how to use vision-language models to generate text descriptions from images:

- Processing and converting images to base64 format for model consumption
- Using vision-language models (VLMs) to analyze visual documents
- Generating detailed summaries from document images
- Inspecting and validating vision-based generation results

### [5. Generating Images](5-generating-images.ipynb)

Generate synthetic image data with Data Designer:

- Configuring image-generation models with `ImageInferenceParams`
- Adding image columns with Jinja2 prompts and sampler-driven diversity
- Preview (base64 in dataframe) vs create (images saved to disk, paths in dataframe)
- Displaying generated images in the notebook

### [6. Image-to-Image Editing](6-editing-images-with-image-context.ipynb)

Chain image generation columns to generate and then edit images:

- Generating images from text and then editing them in a follow-up column
- Using `ImageContext` with auto-detection to pass generated images to an editing model
- Combining sampled accessories and settings for varied edits
- Comparing generated vs edited images in preview and create modes

## 📖 Important Documentation Sections

Before diving into the tutorials, familiarize yourself with these key documentation sections:

### Getting Started

- **[Welcome & Installation](../index.md)** - Overview of Data Designer capabilities and installation instructions

### Core Concepts

Understanding these concepts will help you make the most of the tutorials:

- **[Columns](../concepts/columns.md)** - Learn about different column types (Sampler, LLM, Expression, Validation, etc.)
- **[Validators](../concepts/validators.md)** - Understand how to validate generated data with Python, SQL, and remote validators
- **[Person Sampling](../concepts/person_sampling.md)** - Learn how to sample realistic person data with demographic attributes