Validators are quality assurance mechanisms in Data Designer that check generated content against rules and return structured pass/fail results. They enable automated verification of data for correctness, code quality, and adherence to specifications.
!!! note "Quality Gates for Generated Data"
Validators act as **quality gates** in your generation pipeline. Use them to filter invalid records, score code quality, verify format compliance, or integrate with external validation services.
## Overview
Validation columns execute validation logic against target columns and produce structured results indicating:
3.**Remote validation**: Send data to HTTP endpoints for external validation services
## Validator Types
### 🐍 Python Code Validator
The Python code validator runs generated Python code through [Ruff](https://github.com/astral-sh/ruff), a fast Python linter that checks for syntax errors, undefined variables, and code quality issues.
The SQL code validator supports multiple dialects: `SQL_POSTGRES`, `SQL_ANSI`, `SQL_MYSQL`, `SQL_SQLITE`, `SQL_TSQL` and `SQL_BIGQUERY`.
**Validation Output:**
Each validated record returns:
- **`is_valid`**: `True` if no parsing errors found
- **`error_messages`**: Concatenated error descriptions (empty string if valid)
The validator focuses on parsing errors (PRS codes) that indicate malformed SQL. It also checks for common pitfalls like `DECIMAL` definitions without scale parameters.
**Example Validation Result:**
```python
# Valid SQL
{
"is_valid": True,
"error_messages": ""
}
# Invalid SQL
{
"is_valid": False,
"error_messages": "PRS: Line 1, Position 1: Found unparsable section: 'NOT SQL'"
}
```
### 🔧 Local Callable Validator
The local callable validator executes custom Python functions for flexible validation logic.
- **Output**: DataFrame with `is_valid` column (boolean or null)
- **Extra fields**: Any additional columns become validation metadata
The `output_schema` parameter is optional but recommended—it validates the function's output against a JSON schema, catching unexpected return formats.
### 🌐 Remote Validator
The remote validator sends data to HTTP endpoints for validation-as-a-service. This is useful for when you have validation software that needs to run on external compute and you can expose it through a service. Some examples are:
- External linting services
- Security scanners
- Domain-specific validators
- Proprietary validation systems
!!! note "Authentication"
Currently, the remote validator is only able to perform unauthenticated API calls. When implementing your own service, you can rely on network isolation for security. If you need to reach a service that requires authentication, you should implement a local proxy.
Failed requests use exponential backoff: `delay = retry_backoff^attempt`.
**Parallelization:**
Set `max_parallel_requests` to control concurrency. Higher values improve throughput but increase server load. The validator batches requests according to the `batch_size` parameter in the validation column configuration.
## Using Validators in Columns
Add validation columns to your configuration using the builder's `add_column` method:
The `target_columns` parameter specifies which columns to validate. All target columns are passed to the validator together (except for code validators, which process each column separately).
### Batch Size Considerations
Larger batch sizes improve efficiency but consume more memory:
- **Code validators**: 5-20 records (file I/O overhead)
- **Local callable**: 10-50 records (depends on function complexity)
- **Remote validators**: 1-10 records (network latency, server capacity)
Adjust based on:
- Validator computational cost
- Available memory
- Network bandwidth (for remote validators)
- Server rate limits
If the validation logic uses information from other samples, only samples in the batch will be considered.
**Note**: Code validators always process each target column separately, even when multiple columns are specified. Local callable and remote validators receive all target columns together.