DataDesigner/docs/devnotes/posts/assets/data-designer-got-skills/trace-skill.html
Johnny Greco 0a7b9e0d6d
docs: Data Designer Got Skills dev note (#457)
* docs: add skeleton for "Data Designer Got Skills" dev note

* create assets folder and add blog directory name

* docs: add Claude Code plugin marketplace configuration

Register the repo as a Claude Code plugin marketplace so users can
install the data-designer skill via `/plugin marketplace add`.

* docs: write first draft of "Data Designer Got Skills" dev note

Full prose for all sections: intro with hero benchmark figure,
agents as first-class users, baseline trace walkthrough, CLI and
skill design, benchmark results (228 sessions), getting started
with marketplace and npx install paths, and what's next.

* docs: add error breakdown table and minor refinements

* docs: add sdg and data-designer keywords to plugin metadata

* docs: refine CLI framing, reduce em dashes, slop guard pass

* docs: fix grammar in dev note (serial comma, double-which clause)

* update hero image

* docs: swap hero image, move benchmark figure, minor wording tweaks

* docs: add narrative lead-in to skill trace summary

* docs: refine quality bullet, streamline getting started modes

* remove old image

* slope-guard tweaks
2026-03-24 21:03:00 -04:00

472 lines
24 KiB
HTML

<style>
/* ── Nuke ALL Material-injected pseudo-elements on our details ── */
.md-typeset details.session-trace > summary,
.md-typeset .st-body details > summary {
padding-left: 0 !important; /* Material adds 2.4em for its icon */
position: relative;
}
.md-typeset details.session-trace > summary::before,
.md-typeset details.session-trace > summary::after,
.md-typeset .st-body details > summary::before,
.md-typeset .st-body details > summary::after {
display: none !important;
content: none !important;
}
/* ── Arrow icon (real <span>, not pseudo-element) ── */
.st-arrow {
display: inline-block;
font-size: .55em;
transition: transform .2s;
flex-shrink: 0;
color: var(--md-default-fg-color--light);
}
details[open] > summary > .st-arrow { transform: rotate(90deg); }
/* ── Session trace container ── */
.md-typeset details.session-trace { border: 1px solid var(--md-default-fg-color--lightest); border-radius: .4rem; margin: .6rem 0; overflow: hidden; }
.md-typeset details.session-trace > summary { padding: .65rem 1rem !important; background: var(--md-code-bg-color); cursor: pointer; display: flex; align-items: center; gap: .5rem; flex-wrap: wrap; list-style: none; user-select: none; font-size: .85rem; }
.st-badge { padding: .1rem .4rem; border-radius: .2rem; font-size: .68rem; font-weight: 700; text-transform: uppercase; letter-spacing: .04em; }
.st-badge.skill { background: #76B900; color: #000; }
.st-badge.baseline { background: #f59e0b; color: #000; }
.st-name { font-weight: 600; }
.st-stats { display: inline-flex; gap: .3rem; flex-wrap: wrap; margin-left: auto; }
.st-stat { padding: .08rem .4rem; border-radius: 1rem; font-size: .68rem; font-family: var(--md-code-font-family); background: var(--md-default-fg-color--lightest); white-space: nowrap; }
.st-stat.err { background: rgba(239,68,68,.15); color: #ef4444; }
.st-stat.zero { background: rgba(118,185,0,.12); color: #76B900; }
/* ── Trace body / steps ── */
.st-body { padding: .25rem 0; }
.st-step { padding: .4rem 1rem; display: flex; gap: .6rem; align-items: baseline; border-top: 1px solid var(--md-default-fg-color--lightest); border-left: 3px solid transparent; font-size: .84rem; }
.st-step:first-child { border-top: none; }
.st-step.err { background: rgba(239,68,68,.06); border-left-color: #ef4444; }
.st-step.ok { border-left-color: #76B900; }
.st-num { min-width: 1.5rem; font-size: .7rem; color: var(--md-default-fg-color--lighter); font-family: var(--md-code-font-family); text-align: right; flex-shrink: 0; }
.st-tool { padding: .05rem .35rem; border-radius: .2rem; font-size: .68rem; font-family: var(--md-code-font-family); font-weight: 600; min-width: 2.8rem; text-align: center; flex-shrink: 0; white-space: nowrap; }
.st-tool.bash { background: #3b82f6; color: #fff; }
.st-tool.read { background: #8b5cf6; color: #fff; }
.st-tool.write { background: #10b981; color: #fff; }
.st-tool.edit { background: #f59e0b; color: #000; }
.st-tool.grep { background: #ec4899; color: #fff; }
.st-tool.glob { background: #a855f7; color: #fff; }
.st-tool.agent { background: #06b6d4; color: #000; }
.st-tool.msg { background: transparent; border: 1px solid var(--md-default-fg-color--lighter); color: var(--md-default-fg-color--light); font-weight: 500; font-style: italic; }
.st-desc { flex: 1; min-width: 0; }
.st-desc code { font-size: .78rem; }
/* ── Inner collapsible blocks ── */
.md-typeset .st-body details.st-expand { margin: .25rem 0 .1rem; border: none; box-shadow: none; }
.md-typeset .st-body details.st-expand > summary { cursor: pointer; color: var(--md-default-fg-color--light); font-size: .76rem; padding: .15rem 0 !important; list-style: none; min-height: 0; font-weight: normal; }
.md-typeset .st-body details.st-expand > summary:hover { color: var(--md-default-fg-color); }
.md-typeset .st-body details.st-expand > div { margin-top: .2rem; padding: .5rem .75rem; background: var(--md-code-bg-color); border-radius: .25rem; font-size: .73rem; font-family: var(--md-code-font-family); max-height: 30rem; overflow: auto; line-height: 1.45; white-space: pre; }
</style>
<!-- ── 1. Typical Skill Session ── -->
<details class="session-trace" open>
<summary>
<span class="st-arrow">&#9654;</span>
<span class="st-badge skill">Skill</span>
<span class="st-name">Typical Skill Session</span>
<span class="st-stats">
<span class="st-stat">9 tools</span>
<span class="st-stat zero">0 errors</span>
<span class="st-stat">92s</span>
</span>
</summary>
<div class="st-body">
<div class="st-step">
<span class="st-tool msg">Agent</span>
<span class="st-desc"><em>I&#x27;ll start by learning about the Data Designer library to understand the available column types, samplers, validators, and processors.</em></span>
</div>
<div class="st-step">
<span class="st-num">1</span>
<span class="st-tool bash">Bash</span>
<span class="st-desc">
<code>data-designer agent context 2&gt;&amp;1</code><details class="st-expand"><summary><span class="st-arrow">&#9654;</span> output (7,427 chars)</summary><div>Data Designer v0.5.4rc2
## Config Module
The config module contains all user-facing configuration types. Do not search other modules in the library.
config_root: .venv/lib/python3.13/site-packages/data_designer/config
builder: {config_root}/config_builder.py
base: {config_root}/base.py (read for inherited fields shared by columns and processors)
All config types are accessible via: import data_designer.config as dd
## Types
### columns
file: {config_root}/column_configs.py
type description
------------------------- ---------------------------------------------------------------------------------
CustomColumnConfig Configuration for custom user-defined column generators.
EmbeddingColumnConfig Configuration for embedding generation columns.
ExpressionColumnConfig Configuration for derived columns using Jinja2 expressions.
ImageColumnConfig Configuration for image generation columns.
LLMCodeColumnConfig Configuration for code generation columns using Large Language Models.
LLMJudgeColumnConfig Configuration for LLM-as-a-judge quality assessment and scoring columns.
LLMStructuredColumnConfig Configuration for structured JSON generation columns using Large Language Models.
LLMTextColumnConfig Configuration for text generation columns using Large Language Models.
SamplerColumnConfig Configuration for columns generated using numerical samplers.
SeedDatasetColumnConfig Configuration for columns sourced from seed datasets.
ValidationColumnConfig Configuration for validation columns that validate existing columns.
### constraints
file: {config_root}/sampler_constraints.py
type description
-------------------------- --------------------------------------------------------------------------
ColumnInequalityConstraint Constrain a sampler column to be less/greater than another sampler column.
ScalarInequalityConstraint Constrain a sampler column to be less/greater than a scalar value.
### processors
file: {config_root}/processors.py
type description
------------------------------ ---------------------------------------------------------------------------------
DropColumnsProcessorConfig Drop columns from the output dataset (prefer ``drop=True`` in the column config).
SchemaTransformProcessorConfig Configuration for transforming the dataset schema using Jinja2 templates.
### samplers
file: {config_root}/sampler_params.py
type description
----------------------------- -------------------------------------------------------------------------------------
BernoulliSamplerParams Parameters for sampling from a Bernoulli distribution.
BernoulliMixtureSamplerParams Parameters for sampling from a Bernoulli mixture distribution.
BinomialSamplerParams Parameters for sampling from a Binomial distribution.
CategorySamplerParams Parameters for categorical sampling with optional probability weighting.
DatetimeSamplerParams Parameters for uniform datetime sampling within a specified range.
GaussianSamplerParams Parameters for sampling from a Gaussian (Normal) distribution.
PersonSamplerParams Parameters for sampling synthetic person data with demographic attributes.
PersonFromFakerSamplerParams Parameters for sampling synthetic person data with demographic attributes from Faker.
PoissonSamplerParams Parameters for sampling from a Poisson distribution.
ScipySamplerParams Parameters for sampling from any scipy.stats continuous or discrete distribution.
SubcategorySamplerParams Parameters for subcategory sampling conditioned on a parent category column.
TimeDeltaSamplerParams Parameters for sampling time deltas relative to a reference datetime column.
UniformSamplerParams Parameters for sampling from a continuous Uniform distribution.
UUIDSamplerParams Parameters for generating UUID (Universally Unique Identifier) values.
### validators
file: {config_root}/validator_params.py
type description
---------------------------- -----------------------------------------------------------------------------------------------------
CodeValidatorParams Configuration for code validation. Supports Python and SQL code validation.
LocalCallableValidatorParams Configuration for local callable validation. Expects a function to be passed that validates the data.
RemoteValidatorParams Configuration for remote validation. Sends data to a remote endpoint for validation.
## Model Aliases
default_provider: nvidia
model_alias model generation_type provider
---------------- --------------------------------- --------------- --------
nvidia-embedding nvidia/llama-3.2-nv-embedqa-1b-v2 embedding nvidia
nvidia-reasoning openai/gpt-oss-20b chat-completion nvidia
nvidia-text nvidia/nemotron-3-nano-30b-a3b chat-completion nvidia
nvidia-vision nvidia/nemotron-nano-12b-v2-vl chat-completion nvidia
openai-embedding text-embedding-3-large embedding openai
openai-reasoning openai/openai/gpt-5.2 chat-completion openai
openai-text us/azure/openai/gpt-4.1 chat-completion openai
openai-vision openai/openai/gpt-5.2 chat-completion openai
## Persona Datasets
locale size installed
---------- ------- ---------
en_IN 2.39 GB True
en_SG 0.30 GB True
en_US 1.24 GB True
hi_Deva_IN 4.14 GB True
hi_Latn_IN 2.7 GB True
ja_JP 1.69 GB True
pt_BR 2.33 GB True
## Commands
command_pattern description
------------------------------------------ -------------------------------------------------------------------
data-designer agent context Prints output from all agent subcommands to bootstrap context.
data-designer agent types [family] Type names, descriptions, and source files for one or all families.
data-designer agent state model-aliases Model aliases and usability status.
data-designer agent state persona-datasets Persona locales and install status.</div></details>
</span>
</div>
<div class="st-step">
<span class="st-tool msg">Agent</span>
<span class="st-desc"><em>Now let me inspect the specific config schemas I&#x27;ll need.</em></span>
</div>
<div class="st-step">
<span class="st-num">2</span>
<span class="st-tool read">Read</span>
<span class="st-desc">
<code>data_designer/config/base.py</code>
</span>
</div>
<div class="st-step">
<span class="st-num">3</span>
<span class="st-tool read">Read</span>
<span class="st-desc">
<code>data_designer/config/column_configs.py</code>
</span>
</div>
<div class="st-step">
<span class="st-num">4</span>
<span class="st-tool read">Read</span>
<span class="st-desc">
<code>data_designer/config/sampler_params.py</code>
</span>
</div>
<div class="st-step">
<span class="st-num">5</span>
<span class="st-tool read">Read</span>
<span class="st-desc">
<code>data_designer/config/validator_params.py</code>
</span>
</div>
<div class="st-step">
<span class="st-num">6</span>
<span class="st-tool read">Read</span>
<span class="st-desc">
<code>data_designer/config/config_builder.py</code>
</span>
</div>
<div class="st-step">
<span class="st-num">7</span>
<span class="st-tool read">Read</span>
<span class="st-desc">
<code>data_designer/config/processors.py</code>
</span>
</div>
<div class="st-step">
<span class="st-tool msg">Agent</span>
<span class="st-desc">
<details class="st-expand"><summary><span class="st-arrow">&#9654;</span> message (863 chars)</summary><div>Now I have a complete understanding of the library. Here are my key design decisions:
**Schema:**
1. `difficulty` — Category sampler: beginner, intermediate, advanced (equal weights)
2. `subtopic` — Category sampler: 7 data science subtopics
3. `instruction` — LLM text column: generates a natural language task description conditioned on difficulty + subtopic
4. `solution` — LLM code column (Python): generates a complete solution conditioned on instruction + difficulty + subtopic
5. `code_validation` — Validation column (code validator, Python): checks syntactic correctness of the solution
6. `quality_score` — LLM judge column: scores correctness, code quality, and instruction clarity on 1-5 scale
**Diversity axes:** difficulty x subtopic combinations ensure broad coverage. The LLM prompts emphasize varying libraries, complexity, and operation types.</div></details>
</span>
</div>
<div class="st-step">
<span class="st-num">8</span>
<span class="st-tool write">Write</span>
<span class="st-desc">
<code>text_to_python.py</code><details class="st-expand"><summary><span class="st-arrow">&#9654;</span> written content (7,540 chars)</summary><div># /// script
# dependencies = [
# &quot;data-designer&quot;,
# ]
# ///
import data_designer.config as dd
def load_config_builder() -&gt; dd.DataDesignerConfigBuilder:
config_builder = dd.DataDesignerConfigBuilder()
# 1. Difficulty level
config_builder.add_column(
dd.SamplerColumnConfig(
name=&quot;difficulty&quot;,
sampler_type=&quot;category&quot;,
params=dd.CategorySamplerParams(
values=[&quot;beginner&quot;, &quot;intermediate&quot;, &quot;advanced&quot;],
),
)
)
# 2. Subtopic
config_builder.add_column(
dd.SamplerColumnConfig(
name=&quot;subtopic&quot;,
sampler_type=&quot;category&quot;,
params=dd.CategorySamplerParams(
values=[
&quot;data cleaning&quot;,
&quot;exploratory analysis&quot;,
&quot;aggregation and groupby operations&quot;,
&quot;visualization with matplotlib/seaborn&quot;,
&quot;statistical testing&quot;,
&quot;feature engineering&quot;,
&quot;working with messy or missing data&quot;,
],
),
)
)
# 3. Natural language instruction
config_builder.add_column(
dd.LLMTextColumnConfig(
name=&quot;instruction&quot;,
model_alias=&quot;nvidia-text&quot;,
system_prompt=(
&quot;You are an expert data science educator who writes clear, realistic Python coding tasks. &quot;
&quot;Your instructions should be specific enough that a developer can implement a complete solution &quot;
&quot;without ambiguity. Vary the libraries used (pandas, numpy, scipy, sklearn, matplotlib, seaborn, etc.) &quot;
&quot;and the operations required to match the difficulty level.&quot;
),
prompt=(
&quot;Write a single, self-contained Python coding instruction for a {{ difficulty }}-level data science task &quot;
&quot;in the subtopic of &#x27;{{ subtopic }}&#x27;.\n\n&quot;
&quot;Requirements:\n&quot;
&quot;- The task should be realistic and something a data scientist would actually encounter.\n&quot;
&quot;- For beginner tasks, focus on simple operations with clear steps.\n&quot;
&quot;- For intermediate tasks, combine multiple operations or require domain knowledge.\n&quot;
&quot;- For advanced tasks, involve complex logic, performance considerations, or sophisticated techniques.\n&quot;
&quot;- The instruction should describe what the code should do, including any expected input/output behavior.\n&quot;
&quot;- Do NOT include the solution — only the task description.\n\n&quot;
&quot;Return only the instruction text, nothing else.&quot;
),
)
)
# 4. Python solution
config_builder.add_column(
dd.LLMCodeColumnConfig(
name=&quot;solution&quot;,
model_alias=&quot;nvidia-text&quot;,
code_lang=&quot;python&quot;,
system_prompt=(
&quot;You are an expert Python data scientist. Write clean, well-structured, and correct Python code. &quot;
&quot;The code must be completely self-contained — include all necessary imports, sample data creation, &quot;
&quot;and function definitions. Do not rely on external files or databases. &quot;
&quot;Use realistic variable names and follow PEP 8 conventions.&quot;
),
prompt=(
&quot;Write a complete, self-contained Python solution for the following {{ difficulty }}-level &quot;
&quot;data science task in the area of &#x27;{{ subtopic }}&#x27;.\n\n&quot;
&quot;Task:\n{{ instruction }}\n\n&quot;
&quot;Requirements:\n&quot;
&quot;- Include all necessary imports at the top.\n&quot;
&quot;- Create any sample/mock data needed so the code runs independently.\n&quot;
&quot;- The code must be syntactically valid and runnable as-is.\n&quot;
&quot;- Match the complexity to the {{ difficulty }} level.\n&quot;
&quot;- Add brief inline comments explaining key steps.\n\n&quot;
&quot;Return only the Python code.&quot;
),
)
)
# 5. Code validation
config_builder.add_column(
dd.ValidationColumnConfig(
name=&quot;code_validation&quot;,
target_columns=[&quot;solution&quot;],
validator_type=&quot;code&quot;,
validator_params=dd.CodeValidatorParams(code_lang=&quot;python&quot;),
)
)
# 6. LLM-as-a-judge quality scoring
config_builder.add_column(
dd.LLMJudgeColumnConfig(
name=&quot;quality_score&quot;,
model_alias=&quot;nvidia-text&quot;,
prompt=(
&quot;Evaluate the following Python data science coding task and its solution.\n\n&quot;
&quot;Difficulty: {{ difficulty }}\n&quot;
&quot;Subtopic: {{ subtopic }}\n\n&quot;
&quot;Instruction:\n{{ instruction }}\n\n&quot;
&quot;Solution:\n{{ solution }}\n\n&quot;
&quot;Score each dimension below on a 1-5 scale using the provided rubrics.&quot;
),
scores=[
dd.Score(
name=&quot;correctness&quot;,
description=(
&quot;Does the solution correctly implement what the instruction asks for? &quot;
&quot;Consider whether the code logic is sound, handles edge cases appropriately, &quot;
&quot;and would produce the expected output.&quot;
),
options={
1: &quot;Completely incorrect or does not address the instruction.&quot;,
2: &quot;Major logical errors; partially addresses the instruction.&quot;,
3: &quot;Mostly correct with minor bugs or missing edge cases.&quot;,
4: &quot;Correct implementation with only trivial issues.&quot;,
5: &quot;Fully correct, handles edge cases, and produces expected output.&quot;,
},
),
dd.Score(
name=&quot;code_quality&quot;,
description=(
&quot;Assess the code quality including readability, proper use of libraries, &quot;
&quot;PEP 8 compliance, appropriate variable naming, and idiomatic Python/pandas usage.&quot;
),
options={
1: &quot;Very poor quality: unreadable, no structure, bad naming.&quot;,
2: &quot;Below average: some structure but many style issues.&quot;,
3: &quot;Acceptable: readable but could improve in idiom or style.&quot;,
4: &quot;Good: clean, well-structured, follows conventions.&quot;,
5: &quot;Excellent: idiomatic, elegant, and highly readable.&quot;,
},
),
dd.Score(
name=&quot;instruction_clarity&quot;,
description=(
&quot;How clear and unambiguous is the instruction? Could a competent developer &quot;
&quot;implement the solution from the instruction alone without guessing?&quot;
),
options={
1: &quot;Extremely vague or incomprehensible.&quot;,
2: &quot;Unclear with significant ambiguity.&quot;,
3: &quot;Understandable but some details are missing or ambiguous.&quot;,
4: &quot;Clear with only minor room for interpretation.&quot;,
5: &quot;Crystal clear, specific, and unambiguous.&quot;,
},
),
],
)
)
return config_builder
</div></details><div>File created successfully at: ./text_to_python.py</div>
</span>
</div>
<div class="st-step">
<span class="st-tool msg">Agent</span>
<span class="st-desc"><em>Now let me validate the config.</em></span>
</div>
<div class="st-step ok">
<span class="st-num">9</span>
<span class="st-tool bash">Bash</span>
<span class="st-desc">
<code>data-designer validate text_to_python.py 2&gt;&amp;1</code><div> ────────────────────────── Data Designer Validate ──────────────────────────
Config: text_to_python.py
[22:15:26] [INFO] ✅ Validation passed
✅ Configuration is valid</div>
</span>
</div>
<div class="st-step">
<span class="st-tool msg">Agent</span>
<span class="st-desc">
<details class="st-expand"><summary><span class="st-arrow">&#9654;</span> message (1,025 chars)</summary><div>Validation passed. Here&#x27;s a summary of the dataset pipeline:
| Column | Type | Description |
|---|---|---|
| `difficulty` | Category sampler | beginner, intermediate, advanced (uniform) |
| `subtopic` | Category sampler | 7 data science subtopics (data cleaning, exploratory analysis, aggregation/groupby, visualization, statistical testing, feature engineering, messy/missing data) |
| `instruction` | LLM text | Natural language task description conditioned on difficulty + subtopic |
| `solution` | LLM code (Python) | Complete self-contained Python solution for the instruction |
| `code_validation` | Code validator | Checks the solution for syntactic correctness |
| `quality_score` | LLM judge | Scores correctness, code_quality, and instruction_clarity (1-5 scale with structured reasoning) |
To generate the dataset, run:
```
data-designer create text_to_python.py --num-records &lt;N&gt; --dataset-name text-to-python
```
Note: generation can take a long time for large record counts (50+), so choose `N` accordingly.</div></details>
</span>
</div>
</div>
</details>