* chore: add __init__.py to engine namespace subpackages
Griffe (used by mkdocstrings) skips directories without __init__.py
when resolving module paths, which prevented the new plugins code
reference from rendering SeedReader, FileSystemSeedReader, and
Processor. Adding empty __init__.py files in engine/resources/,
engine/processing/, and engine/processing/processors/ aligns with
the convention already used in engine/mcp/, engine/models/, etc.
* docs: flesh out docstrings on plugin extension-point classes
Plugin authors now see meaningful descriptions for every field and
method on the bases rendered in the plugins code reference:
- Plugin and PluginType: class docstrings + Attributes tables for
fields and enum members; fix typo in config_qualified_name field
description.
- SingleColumnConfig: document allow_resize.
- ProcessorConfig: document processor_type discriminator.
- SeedSource: document seed_type discriminator.
- FileSystemSeedSource: add class docstring + Attributes table for
path / file_pattern / recursive.
- ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add
class docstrings explaining when to use each base, plus method
docstrings on the abstract generate() implementations.
* docs: graduate plugins out of experimental mode
Restructures plugin documentation around the now-stable extension
points (column generator, seed reader, processor) and treats plugins
as a first-class story for customizing Data Designer.
- Add code_reference/plugins.md: single-stop reference for the Plugin
object and the config + implementation base classes used by all
three plugin types.
- Add code_reference/generators.md: column generator implementation
base classes, separated from column configs.
- Surface SingleColumnConfig in code_reference/column_configs.md.
- Add plugins/implement.md ("Build Your Own"): per-type implementation
instructions across column generators, seed readers, and processors.
- Add plugins/processor.md: complete processor plugin package example.
- Rewrite plugins/overview.md: open with why plugins exist, drop the
internal-helpers note (PluginRegistry / PluginManager), and focus
the guide on what plugin builders need.
- Refresh plugins/available.md (Catalog) and
plugins/filesystem_seed_reader.md to match the new structure.
- Delete plugins/example.md (replaced by per-type guides).
- Reorder Code Reference nav alphabetically and add the new pages.
- Minor link / wording fixes in concepts/processors.md and
concepts/deployment-options.md.
* docs: simplify plugin docs structure
Replace the overview's how-to walkthrough and the per-type plugin
guides with a single Build Your Own page that covers all three
plugin types side-by-side. Add a dedicated Using Models in Plugins
guide and a seed_readers code reference, and trim the overview down
to what the plugin types are, how to use one, and how discovery
works.
- Rename plugins/implement.md to plugins/build_your_own.md.
- Delete plugins/filesystem_seed_reader.md and plugins/processor.md
(their content is now in build_your_own.md and the per-type code
references).
- Add plugins/models.md for model-backed column generator authoring.
- Add code_reference/seed_readers.md for seed reader implementation
base classes.
- Rewrite plugins/overview.md: shorter intro, type bullets link to
the relevant code reference, drop the multi-step "How do you
create plugins" walkthrough in favor of a single Build a Plugin
pointer, tighten Discovery troubleshooting.
- Refresh plugins/available.md (Available Plugins): point to the
DataDesignerPlugins catalog and explain how to request a community
listing.
- Update cross-page links in concepts/processors.md,
concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md,
code_reference/plugins.md, and code_reference/generators.md to
match the new structure.
- Update mkdocs.yml nav: rename to Build Your Own, add Using Models,
add seed_readers code reference.
* docs: scroll wide tables horizontally instead of wrapping
Code-heavy reference tables (plugin bases, column generators, etc.)
were wrapping aggressively on narrow viewports, breaking long
identifiers across multiple lines. Switch the table container to
horizontal overflow and prevent code cells from wrapping so
identifiers stay readable.
* docs: address PR #603 review feedback
- Add an Implementation base section to code_reference/processors.md
rendering the engine-side Processor class. This justifies the
engine/processing/__init__.py files added earlier and gives
processor plugin authors an auto-rendered API reference, matching
the pattern used by code_reference/generators.md and seed_readers.md.
- build_your_own.md: replace the placeholder "x" emoji on the
IndexMultiplier example with the actual multiplication sign.
- build_your_own.md: drop the manual `re.compile + apply(lambda)`
pattern in the regex-filter processor in favor of the idiomatic
`Series.str.contains(..., regex=True)`.
- build_your_own.md: add a kernel-restart caveat after the editable
install instructions — PluginRegistry caches discovery on first
import, so notebooks need a fresh kernel to pick up freshly
installed plugins.
- build_your_own.md: state explicitly what `assert_valid_plugin`
checks (config base + plugin-type-appropriate impl base).
- code_reference/plugins.md: link out to the processors code
reference alongside generators and seed_readers.
* docs: split code reference by package
* docs: add interface code reference
* docs: add code reference overviews
* docs: refine code reference pages
* docs: improve code reference tables
* docs: correct reference docstrings
* docs: embed plugin catalog table
* docs: note plugin discovery restart caveat
* docs: explain generator base class choice
* docs: mention async cell generator examples
* docs: clarify plugin model usage
* docs: clarify plugin model aliases
* docs: address plugin review feedback
* docs: update available plugins page
10 KiB
Security
Data Designer can run in two very different trust models:
- Trusted / monolithic: The same user or team writes the config and runs the engine.
- Untrusted / shared execution: One user submits a config and a different process, service, or team executes it.
That distinction matters for features that evaluate user-supplied configuration at runtime, such as Jinja template rendering. In a trusted local workflow, broader template flexibility may be acceptable. In a shared-service deployment, user-supplied Jinja becomes part of the engine's remote code execution surface. A template sandbox escape would execute inside the process running Data Designer.
See Deployment Options for the architectures where that trust boundary changes.
Jinja Rendering Modes
Data Designer exposes the renderer choice through RunConfig:
import data_designer.config as dd
run_config = dd.RunConfig(
jinja_rendering_engine=dd.JinjaRenderingEngine.SECURE,
)
SECURE is the default. Opt into NATIVE only when you are comfortable treating the config author and the engine operator as the same trust domain.
| Mode | What it uses | Best fit |
|---|---|---|
SECURE |
Data Designer's hardened renderer built on top of Jinja2's sandbox | Shared services, microservices, internal platforms, or any deployment where config submission is separated from execution |
NATIVE |
Jinja2's built-in sandbox with Data Designer's variable whitelist | Local library usage and other trusted, monolithic workflows that want broader Jinja behavior |
!!! warning "Treat untrusted Jinja as a security boundary"
If many users can submit configs to one engine, or if configs are accepted over an API and executed elsewhere, keep JinjaRenderingEngine.SECURE. In that model, Jinja templates are no longer just prompt-formatting helpers. They are untrusted user programs being evaluated by your engine.
Compatibility Matrix
NATIVE is not an unrestricted Python template engine. The matrix below shows what each mode permits, restricts, or adds on top of Jinja2's standard sandbox behavior.
| Capability | NATIVE |
SECURE |
|---|---|---|
Jinja2 ImmutableSandboxedEnvironment baseline |
Yes | Yes |
| References to explicitly provided dataset variables only | Yes | Yes |
| Standard Jinja built-in filter set | Yes | Subset only |
Data Designer jsonpath filter |
Yes | Yes |
import, macro, set, extends, block support |
Yes | No |
Nested or recursive for loops |
Yes | No |
| Unbounded AST complexity | Yes | No |
| Template context sanitized to JSON-compatible types before render | No | Yes |
| Empty, oversized, or built-in-like rendered output is permitted | Yes | No |
What SECURE Adds on Top of Standard Jinja Sandbox
The SECURE renderer uses a hardened environment implemented in the renderer source file on GitHub. Compared with the standard Jinja sandbox, it adds several additional controls.
Record Sanitization Before Render
Before rendering, SECURE forces template context through a JSON-compatible serialization step. That means remote templates operate on plain data, not arbitrary Python objects.
# Intended shape for remote template context
record = {
"user": {
"name": "alice",
"roles": ["admin", "reviewer"],
}
}
# Not the kind of server-side object SECURE wants to expose directly
record = {
"user": SomePythonObject(...),
}
In a remote execution setting, exposing rich Python objects increases the risk of attribute- and method-based sandbox escapes. Jinja's sandbox security considerations note that the sandbox is not a complete security boundary, and past escapes have included str.format (CVE-2016-10745), str.format_map (CVE-2019-10906), indirect str.format references (CVE-2024-56326), and |attr-based access to format (CVE-2025-27516); PortSwigger's server-side template injection research covers the broader object-traversal pattern.
Filter Allowlist
SECURE keeps only a small approved subset of Jinja filters plus the Data Designer jsonpath filter. If a filter is not on that allowlist, the template is rejected. Common excluded filters are:
| Disallowed filters | Why they are excluded in SECURE |
|---|---|
attr, xmlattr |
These add dynamic attribute lookup or attribute-name construction, which widens the object-traversal surface in untrusted templates. |
map, select, reject, selectattr, rejectattr, groupby, batch, slice, sum |
These make templates behave more like a data-processing language and can multiply compute across large inputs. |
join, format, indent, wordwrap, center, filesizeformat |
These expand presentation and composition logic inside the template. SECURE keeps formatting logic narrow so templates stay close to interpolation. |
default, d, dictsort, count, wordcount, pprint, tojson |
These encourage fallback logic, secondary data shaping, or debug-style output inside the template rather than in the engine or config layer. |
safe, striptags, urlize |
These are primarily HTML-oriented output transforms and are unnecessary for server-side dataset rendering. |
Some omitted convenience filters, such as the e alias for escape, are excluded because SECURE uses a small explicit allowlist. The current implementation does not assign each omitted filter its own separate security rationale.
Use NATIVE when full Jinja filter compatibility matters more than the additional restrictions used for untrusted template execution.
Template Features Removed
SECURE rejects import, macro, set, extends, and block.
{% macro render_name(name) %}{{ name }}{% endmacro %}
{{ render_name(customer_name) }}
{% set temp = user_id %}
{{ temp }}
Those features are useful in trusted authoring environments, but they also make user templates more expressive and stateful. In a remote execution model, SECURE intentionally narrows the language so templates stay closer to data interpolation than to a reusable programming layer.
Loop Restrictions
SECURE rejects recursive loops and nested for loops.
{% for row in rows %}
{% for item in row %}
{{ item }}
{% endfor %}
{% endfor %}
Nested and recursive loops are especially risky in shared execution because they can amplify compute cost and output size in ways that are hard to reason about from the outside.
AST Complexity Limits
SECURE statically analyzes the parsed Jinja AST and rejects templates that exceed the current limits of 600 nodes or depth 10.
{% if a %}
{% if b %}
{% if c %}
{{ value }}
{% endif %}
{% endif %}
{% endif %}
This is not about any one feature being unsafe by itself. It is about limiting how much control flow and composition untrusted templates can pack into a single server-side render operation, which helps prevent compute bombs in shared execution.
self References Blocked
SECURE rejects references to self.
{{ self }}
The point is to avoid exposing template internals back to the submitter. In a remote setting, even accidental access to those internals is unnecessary surface area.
Rendered Output Guards
SECURE validates rendered output after template execution. It rejects empty output, very large output, and strings that look like Python built-in or function representations.
{{ "" }}
<built-in method ...>
<function ...>
These checks matter because not all bad outcomes come from parse-time behavior. Some templates are syntactically valid but still produce output that is clearly broken, oversized, or revealing internal implementation details.
Sanitized User-Facing Errors
At the engine boundary, SECURE normalizes most template failures into a generic invalid-template message.
User provided prompt generation template is invalid.
That matters in remote execution because exception details can leak information about server-side implementation, supported objects, or internal execution paths that untrusted users do not need to see.
These controls exist because the standard sandbox is a good baseline, but shared-service deployments need a narrower and more defensive execution model.
Why This Matters in Multi-User Deployments
The security posture changes as soon as config submission and execution are separated.
Examples:
- A centralized Data Designer service accepts configs from many users.
- An internal platform lets users upload or edit configs that are executed by a background worker.
- A REST API accepts Jinja-containing configs and runs them on server-side infrastructure.
In those environments, templates are no longer just local convenience syntax. They are untrusted input being evaluated by infrastructure the submitter does not control. In practice, that makes Jinja rendering a remote code execution concern, which is why SECURE exists and why it remains the default.
If you are deciding between local library usage and a shared service model, read Deployment Options. The library patterns are often still "trusted" deployments. The shared microservice pattern is not.
When To Use NATIVE
Use NATIVE when all of the following are true:
- The person submitting the config is also the person running the engine, or they are in the same trusted operational boundary.
- You want broader standard Jinja behavior than
SECUREallows. - You understand that this is a flexibility tradeoff, not the safer default.
For example, this is often reasonable in a notebook, local script, or other single-user library workflow.