DataDesigner/docs/notebook_source/_README.md
Kirit Thadaka de7c3ab99a
docs: add deployment, performance tuning guides and streamline gettin… (#277)
* docs: add deployment, performance tuning guides and streamline getting started

- Add deployment-options.md: Library vs. Microservice decision guide
- Add inference-architecture.md: Separation of concerns with LLM servers
- Add performance-tuning.md: Concurrency and batching optimization guide
- Streamline index.md: Merge installation, add quick example, simplify
- Remove quick-start.md: Content merged into welcome page
- Remove installation.md: Content merged into welcome page
- Update model docs: Add concurrency control sections and cross-references
- Update mkdocs.yml: Add new Architecture section to navigation

* docs: add tasteful emojis to new documentation pages

* docs: consolidate redundant concurrency and troubleshooting content

- Remove duplicate max_parallel_requests tables from model-configs.md and inference-parameters.md
- Remove duplicate Concurrency Control section from model-configs.md
- Simplify Concurrency Control in inference-parameters.md to link to performance-tuning.md
- Remove Troubleshooting section from inference-architecture.md (covered in performance-tuning.md)
- performance-tuning.md is now the authoritative source for tuning guidance

* Simplified doc additions

* Switched default model to nemotron 3 nano

* Addressed feedback

* Added first blog draft
2026-02-02 21:03:58 -08:00

4.5 KiB

Overview

Welcome to the Data Designer tutorial series! These hands-on notebooks will guide you through the core concepts and features of Data Designer, from basic synthetic data generation to advanced techniques like structured outputs and dataset seeding.

🚀 Setting Up Your Environment

Local Setup Best Practices

First, download the tutorial from the release assets. To run the tutorial notebooks locally, we recommend using a virtual environment to manage dependencies:

=== "uv (Recommended)"

```bash
# Extract tutorial notebooks
unzip data_designer_tutorial.zip
cd data_designer_tutorial

# Launch Jupyter
uv run jupyter notebook
```

=== "pip + venv"

```bash
# Extract tutorial notebooks
unzip data_designer_tutorial.zip
cd data_designer_tutorial

# Create Python virtual environment and install required packages
python -m venv venv
source venv/bin/activate
pip install data-designer jupyter

# Launch Jupyter
jupyter notebook
```

API Keys and Authentication

Data Designer is able to interface with various LLM providers. You'll need to set up API keys for the models you want to use:

# For NVIDIA API Catalog (build.nvidia.com)
export NVIDIA_API_KEY="your-api-key-here"

# For OpenAI
export OPENAI_API_KEY="your-api-key-here"

# For OpenRouter
export OPENROUTER_API_KEY="your-api-key-here"

For more information, check the Welcome, Default Model Settings and how to Configure Model Settings Using The CLI.

📚 Tutorial Series

The tutorials are designed to be completed in sequence, building upon concepts introduced in previous notebooks:

1. The Basics

Learn the fundamentals of Data Designer by generating a simple product review dataset. This notebook covers:

  • Setting up the DataDesigner interface
  • Configuring models and inference parameters
  • Using built-in samplers (Category, Person, Uniform)
  • Generating LLM text columns with dependencies
  • Understanding the generation workflow

Start here if you're new to Data Designer!

2. Structured Outputs and Jinja Expressions

Explore more advanced data generation capabilities:

  • Creating structured JSON outputs with schemas
  • Using Jinja expressions for derived columns
  • Combining samplers with structured data
  • Building complex data dependencies
  • Working with nested data structures

3. Seeding with an External Dataset

Learn how to leverage existing datasets to guide synthetic data generation:

  • Loading and using seed datasets
  • Sampling from real data distributions
  • Combining seed data with LLM generation
  • Creating realistic synthetic data based on existing patterns

4. Providing Images as Context

Learn how to use vision-language models to generate text descriptions from images:

  • Processing and converting images to base64 format for model consumption
  • Using vision-language models (VLMs) to analyze visual documents
  • Generating detailed summaries from document images
  • Inspecting and validating vision-based generation results

📖 Important Documentation Sections

Before diving into the tutorials, familiarize yourself with these key documentation sections:

Getting Started

Core Concepts

Understanding these concepts will help you make the most of the tutorials:

  • Columns - Learn about different column types (Sampler, LLM, Expression, Validation, etc.)
  • Validators - Understand how to validate generated data with Python, SQL, and remote validators
  • Person Sampling - Learn how to sample realistic person data with demographic attributes

Code Reference

Quick reference guides for the main configuration objects: