DataDesigner/docs/notebook_source/_README.md
Nabin Mulepati 3b4e296baf
feat: add OpenRouter as one of the default providers (#161)
* Add openrouter as a default provider

* Update docs
2026-01-06 10:22:18 -07:00

124 lines
4.6 KiB
Markdown

# Overview
Welcome to the Data Designer tutorial series! These hands-on notebooks will guide you through the core concepts and features of Data Designer, from basic synthetic data generation to advanced techniques like structured outputs and dataset seeding.
## 🚀 Setting Up Your Environment
### Local Setup Best Practices
First, download the tutorial [from the release assets](https://github.com/NVIDIA-NeMo/DataDesigner/releases/latest/download/data_designer_tutorial.zip).
To run the tutorial notebooks locally, we recommend using a virtual environment to manage dependencies:
=== "uv (Recommended)"
```bash
# Extract tutorial notebooks
unzip data_designer_tutorial.zip
cd data_designer_tutorial
# Launch Jupyter
uv run jupyter notebook
```
=== "pip + venv"
```bash
# Extract tutorial notebooks
unzip data_designer_tutorial.zip
cd data_designer_tutorial
# Create Python virtual environment and install required packages
python -m venv venv
source venv/bin/activate
pip install data-designer jupyter
# Launch Jupyter
jupyter notebook
```
### API Keys and Authentication
Data Designer is able to interface with various LLM providers. You'll need to set up API keys for the models you want to use:
```bash
# For NVIDIA API Catalog (build.nvidia.com)
export NVIDIA_API_KEY="your-api-key-here"
# For OpenAI
export OPENAI_API_KEY="your-api-key-here"
# For OpenRouter
export OPENROUTER_API_KEY="your-api-key-here"
```
For more information, check the [Quick Start](../quick-start.md), [Default Model Settings](../concepts/models/default-model-settings.md) and how to [Configure Model Settings Using The CLI](../concepts/models/configure-model-settings-with-the-cli.md).
## 📚 Tutorial Series
The tutorials are designed to be completed in sequence, building upon concepts introduced in previous notebooks:
### [1. The Basics](1-the-basics.ipynb)
Learn the fundamentals of Data Designer by generating a simple product review dataset. This notebook covers:
- Setting up the `DataDesigner` interface
- Configuring models and inference parameters
- Using built-in samplers (Category, Person, Uniform)
- Generating LLM text columns with dependencies
- Understanding the generation workflow
**Start here if you're new to Data Designer!**
### [2. Structured Outputs and Jinja Expressions](2-structured-outputs-and-jinja-expressions.ipynb)
Explore more advanced data generation capabilities:
- Creating structured JSON outputs with schemas
- Using Jinja expressions for derived columns
- Combining samplers with structured data
- Building complex data dependencies
- Working with nested data structures
### [3. Seeding with an External Dataset](3-seeding-with-a-dataset.ipynb)
Learn how to leverage existing datasets to guide synthetic data generation:
- Loading and using seed datasets
- Sampling from real data distributions
- Combining seed data with LLM generation
- Creating realistic synthetic data based on existing patterns
### [4. Providing Images as Context](4-providing-images-as-context.ipynb)
Learn how to use vision-language models to generate text descriptions from images:
- Processing and converting images to base64 format for model consumption
- Using vision-language models (VLMs) to analyze visual documents
- Generating detailed summaries from document images
- Inspecting and validating vision-based generation results
## 📖 Important Documentation Sections
Before diving into the tutorials, familiarize yourself with these key documentation sections:
### Getting Started
- **[Installation](../installation.md)** - Detailed installation instructions for various setups
- **[Welcome Guide](../index.md)** - Overview of Data Designer capabilities and architecture
### Core Concepts
Understanding these concepts will help you make the most of the tutorials:
- **[Columns](../concepts/columns.md)** - Learn about different column types (Sampler, LLM, Expression, Validation, etc.)
- **[Validators](../concepts/validators.md)** - Understand how to validate generated data with Python, SQL, and remote validators
- **[Person Sampling](../concepts/person_sampling.md)** - Learn how to sample realistic person data with demographic attributes
### Code Reference
Quick reference guides for the main configuration objects:
- **[column_configs](../code_reference/column_configs.md)** - All column configuration types
- **[config_builder](../code_reference/config_builder.md)** - The `DataDesignerConfigBuilder` API
- **[data_designer_config](../code_reference/data_designer_config.md)** - Main configuration schema
- **[validator_params](../code_reference/validator_params.md)** - Validator configuration options