* docs: add deployment, performance tuning guides and streamline getting started - Add deployment-options.md: Library vs. Microservice decision guide - Add inference-architecture.md: Separation of concerns with LLM servers - Add performance-tuning.md: Concurrency and batching optimization guide - Streamline index.md: Merge installation, add quick example, simplify - Remove quick-start.md: Content merged into welcome page - Remove installation.md: Content merged into welcome page - Update model docs: Add concurrency control sections and cross-references - Update mkdocs.yml: Add new Architecture section to navigation * docs: add tasteful emojis to new documentation pages * docs: consolidate redundant concurrency and troubleshooting content - Remove duplicate max_parallel_requests tables from model-configs.md and inference-parameters.md - Remove duplicate Concurrency Control section from model-configs.md - Simplify Concurrency Control in inference-parameters.md to link to performance-tuning.md - Remove Troubleshooting section from inference-architecture.md (covered in performance-tuning.md) - performance-tuning.md is now the authoritative source for tuning guidance * Simplified doc additions * Switched default model to nemotron 3 nano * Addressed feedback * Added first blog draft
4.5 KiB
Overview
Welcome to the Data Designer tutorial series! These hands-on notebooks will guide you through the core concepts and features of Data Designer, from basic synthetic data generation to advanced techniques like structured outputs and dataset seeding.
🚀 Setting Up Your Environment
Local Setup Best Practices
First, download the tutorial from the release assets. To run the tutorial notebooks locally, we recommend using a virtual environment to manage dependencies:
=== "uv (Recommended)"
```bash
# Extract tutorial notebooks
unzip data_designer_tutorial.zip
cd data_designer_tutorial
# Launch Jupyter
uv run jupyter notebook
```
=== "pip + venv"
```bash
# Extract tutorial notebooks
unzip data_designer_tutorial.zip
cd data_designer_tutorial
# Create Python virtual environment and install required packages
python -m venv venv
source venv/bin/activate
pip install data-designer jupyter
# Launch Jupyter
jupyter notebook
```
API Keys and Authentication
Data Designer is able to interface with various LLM providers. You'll need to set up API keys for the models you want to use:
# For NVIDIA API Catalog (build.nvidia.com)
export NVIDIA_API_KEY="your-api-key-here"
# For OpenAI
export OPENAI_API_KEY="your-api-key-here"
# For OpenRouter
export OPENROUTER_API_KEY="your-api-key-here"
For more information, check the Welcome, Default Model Settings and how to Configure Model Settings Using The CLI.
📚 Tutorial Series
The tutorials are designed to be completed in sequence, building upon concepts introduced in previous notebooks:
1. The Basics
Learn the fundamentals of Data Designer by generating a simple product review dataset. This notebook covers:
- Setting up the
DataDesignerinterface - Configuring models and inference parameters
- Using built-in samplers (Category, Person, Uniform)
- Generating LLM text columns with dependencies
- Understanding the generation workflow
Start here if you're new to Data Designer!
2. Structured Outputs and Jinja Expressions
Explore more advanced data generation capabilities:
- Creating structured JSON outputs with schemas
- Using Jinja expressions for derived columns
- Combining samplers with structured data
- Building complex data dependencies
- Working with nested data structures
3. Seeding with an External Dataset
Learn how to leverage existing datasets to guide synthetic data generation:
- Loading and using seed datasets
- Sampling from real data distributions
- Combining seed data with LLM generation
- Creating realistic synthetic data based on existing patterns
4. Providing Images as Context
Learn how to use vision-language models to generate text descriptions from images:
- Processing and converting images to base64 format for model consumption
- Using vision-language models (VLMs) to analyze visual documents
- Generating detailed summaries from document images
- Inspecting and validating vision-based generation results
📖 Important Documentation Sections
Before diving into the tutorials, familiarize yourself with these key documentation sections:
Getting Started
- Welcome & Installation - Overview of Data Designer capabilities and installation instructions
Core Concepts
Understanding these concepts will help you make the most of the tutorials:
- Columns - Learn about different column types (Sampler, LLM, Expression, Validation, etc.)
- Validators - Understand how to validate generated data with Python, SQL, and remote validators
- Person Sampling - Learn how to sample realistic person data with demographic attributes
Code Reference
Quick reference guides for the main configuration objects:
- column_configs - All column configuration types
- config_builder - The
DataDesignerConfigBuilderAPI - data_designer_config - Main configuration schema
- validator_params - Validator configuration options