Commit graph

28 commits

Author SHA1 Message Date
Nabin Mulepati
52d42fe1b7
feat: add audio and video context (#701)
Some checks failed
CI / Test Engine (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Engine (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Lint and Format Check (push) Has been cancelled
CI / Check License Headers (push) Has been cancelled
CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Coverage Check (Python 3.11) (push) Has been cancelled
CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
* feat: add audio and video context

Add audio/video context config models and canonical media helpers.

Translate canonical media blocks for OpenAI-compatible clients while preserving URL media as URLs. Reject unsupported audio/video blocks in the Anthropic adapter.

Refs #671

* fix: harden media context review gaps

Preserve extensionless HTTP(S) audio and video URLs as URL media, reject local path-looking audio/video context values, and reject provider-specific audio/video blocks in the Anthropic adapter.

Refs #671

* test: add audio video context smoke notebook

Add a Jupytext source notebook and generated Colab artifact that exercise audio/video context URL, base64, local path rejection, OpenAI-compatible payload translation, and Anthropic unsupported-media handling.

Refs #671

* test: make media context notebook end to end

Rewrite the audio/video smoke notebook to run a full Data Designer preview against a local OpenAI-compatible HTTP server. Assert the generated dataset, captured endpoint payload, URL/base64 translation, and local path rejection through the interface pipeline.

Refs #671

* test: remove media context notebook from docs

Move the generated audio/video context E2E notebook out of the PR docs surface and keep it locally under the main checkout's .scratch directory.

Refs #671

* harden multimodal media context handling

* address media context review notes

Remove unused URL-specific media helpers, share the base64 data URI parser in Anthropic translation, align AudioContext validation messaging, and update config docs for audio/video contexts.

Refs #671

* docs: update media context guidance

* refactor: consolidate media helpers

* support local audio and video paths

* refactor: combine media path checks

* address media context review feedback

* remove openai media preflight

* sync generated colab notebooks

* align media local path autodetection
2026-05-22 11:54:40 -06:00
Andre Manoel
b6de38d894
docs: remove docs code reference (#674)
Some checks failed
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish Fern devnotes / deploy (push) Has been cancelled
2026-05-21 18:29:18 -04:00
Johnny Greco
8b8d748446
docs: graduate plugins out of experimental mode (#603)
* chore: add __init__.py to engine namespace subpackages

Griffe (used by mkdocstrings) skips directories without __init__.py
when resolving module paths, which prevented the new plugins code
reference from rendering SeedReader, FileSystemSeedReader, and
Processor. Adding empty __init__.py files in engine/resources/,
engine/processing/, and engine/processing/processors/ aligns with
the convention already used in engine/mcp/, engine/models/, etc.

* docs: flesh out docstrings on plugin extension-point classes

Plugin authors now see meaningful descriptions for every field and
method on the bases rendered in the plugins code reference:

- Plugin and PluginType: class docstrings + Attributes tables for
  fields and enum members; fix typo in config_qualified_name field
  description.
- SingleColumnConfig: document allow_resize.
- ProcessorConfig: document processor_type discriminator.
- SeedSource: document seed_type discriminator.
- FileSystemSeedSource: add class docstring + Attributes table for
  path / file_pattern / recursive.
- ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add
  class docstrings explaining when to use each base, plus method
  docstrings on the abstract generate() implementations.

* docs: graduate plugins out of experimental mode

Restructures plugin documentation around the now-stable extension
points (column generator, seed reader, processor) and treats plugins
as a first-class story for customizing Data Designer.

- Add code_reference/plugins.md: single-stop reference for the Plugin
  object and the config + implementation base classes used by all
  three plugin types.
- Add code_reference/generators.md: column generator implementation
  base classes, separated from column configs.
- Surface SingleColumnConfig in code_reference/column_configs.md.
- Add plugins/implement.md ("Build Your Own"): per-type implementation
  instructions across column generators, seed readers, and processors.
- Add plugins/processor.md: complete processor plugin package example.
- Rewrite plugins/overview.md: open with why plugins exist, drop the
  internal-helpers note (PluginRegistry / PluginManager), and focus
  the guide on what plugin builders need.
- Refresh plugins/available.md (Catalog) and
  plugins/filesystem_seed_reader.md to match the new structure.
- Delete plugins/example.md (replaced by per-type guides).
- Reorder Code Reference nav alphabetically and add the new pages.
- Minor link / wording fixes in concepts/processors.md and
  concepts/deployment-options.md.

* docs: simplify plugin docs structure

Replace the overview's how-to walkthrough and the per-type plugin
guides with a single Build Your Own page that covers all three
plugin types side-by-side. Add a dedicated Using Models in Plugins
guide and a seed_readers code reference, and trim the overview down
to what the plugin types are, how to use one, and how discovery
works.

- Rename plugins/implement.md to plugins/build_your_own.md.
- Delete plugins/filesystem_seed_reader.md and plugins/processor.md
  (their content is now in build_your_own.md and the per-type code
  references).
- Add plugins/models.md for model-backed column generator authoring.
- Add code_reference/seed_readers.md for seed reader implementation
  base classes.
- Rewrite plugins/overview.md: shorter intro, type bullets link to
  the relevant code reference, drop the multi-step "How do you
  create plugins" walkthrough in favor of a single Build a Plugin
  pointer, tighten Discovery troubleshooting.
- Refresh plugins/available.md (Available Plugins): point to the
  DataDesignerPlugins catalog and explain how to request a community
  listing.
- Update cross-page links in concepts/processors.md,
  concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md,
  code_reference/plugins.md, and code_reference/generators.md to
  match the new structure.
- Update mkdocs.yml nav: rename to Build Your Own, add Using Models,
  add seed_readers code reference.

* docs: scroll wide tables horizontally instead of wrapping

Code-heavy reference tables (plugin bases, column generators, etc.)
were wrapping aggressively on narrow viewports, breaking long
identifiers across multiple lines. Switch the table container to
horizontal overflow and prevent code cells from wrapping so
identifiers stay readable.

* docs: address PR #603 review feedback

- Add an Implementation base section to code_reference/processors.md
  rendering the engine-side Processor class. This justifies the
  engine/processing/__init__.py files added earlier and gives
  processor plugin authors an auto-rendered API reference, matching
  the pattern used by code_reference/generators.md and seed_readers.md.
- build_your_own.md: replace the placeholder "x" emoji on the
  IndexMultiplier example with the actual multiplication sign.
- build_your_own.md: drop the manual `re.compile + apply(lambda)`
  pattern in the regex-filter processor in favor of the idiomatic
  `Series.str.contains(..., regex=True)`.
- build_your_own.md: add a kernel-restart caveat after the editable
  install instructions — PluginRegistry caches discovery on first
  import, so notebooks need a fresh kernel to pick up freshly
  installed plugins.
- build_your_own.md: state explicitly what `assert_valid_plugin`
  checks (config base + plugin-type-appropriate impl base).
- code_reference/plugins.md: link out to the processors code
  reference alongside generators and seed_readers.

* docs: split code reference by package

* docs: add interface code reference

* docs: add code reference overviews

* docs: refine code reference pages

* docs: improve code reference tables

* docs: correct reference docstrings

* docs: embed plugin catalog table

* docs: note plugin discovery restart caveat

* docs: explain generator base class choice

* docs: mention async cell generator examples

* docs: clarify plugin model usage

* docs: clarify plugin model aliases

* docs: address plugin review feedback

* docs: update available plugins page
2026-05-06 18:12:44 -04:00
Nabin Mulepati
a9af365e8e
feat: add skip.when conditional column generation (#502)
* plan: add skip_when for conditional column generation (#479)

Adds implementation plan for a `skip_when` field on `SingleColumnConfig`
that enables conditional column generation. When the Jinja2 expression
evaluates truthy, the cell is set to None and the generator is skipped.
Skips auto-propagate through the DAG to downstream columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: remove HopChain example from skip_when plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: replace HopChain example with generic product review example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: add open questions on skip sentinel value and row filtering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: major revision — SkipConfig model, sync engine support, decouple propagation

- Introduce SkipConfig(when, value) as nested model on SingleColumnConfig
- Move propagate_skip to SingleColumnConfig as independent field, fixing
  bug where columns with no SkipConfig couldn't participate in propagation
- Add full sync engine implementation (Steps 4a-4d) covering both
  _fan_out_with_threads and _run_full_column_generator dispatch paths
- Add serialization boundary stripping for both DatasetBatchManager (sync)
  and RowGroupBufferManager (async)
- Simplify architecture diagrams for readability
- Update all references, design decisions, verification plan

Made-with: Cursor

* updates

* plan: document get_required_columns for skip propagation

- Explain why propagation must not use get_upstream_columns() once
  skip.when adds DAG edges; add _required_columns and
  get_required_columns() to the execution graph plan
- Point async _run_cell at get_required_columns for parity with sync
- Clarify DropSkippedRowsProcessorConfig vs stripping __skipped__ for
  DataFrames; tighten resolved-questions wording
- Extend DAG/graph verification with gating_col regression case

Refs #479

Made-with: Cursor

* plan: centralize __skipped__ handling in skip_provenance

- Document new skip_provenance.py (key constant, read/write/strip API)
- Point sync builder, async scheduler, and batch buffers at shared helpers
- Strip metadata before every DataFrame from buffer dicts, including
  FULL_COLUMN active subsets
- Split §3 into skip_evaluator vs skip_provenance; extend verification

Refs #479

Made-with: Cursor

* plan: align doc title with SkipConfig / skip.when

Drop legacy skip_when naming in headings and #362 cross-reference.

Refs #479

Made-with: Cursor

* plan: address review — delimiter validation, centralized error handling, caller-owns-deserialization

- SkipConfig._validate_when_syntax now checks find_undeclared_variables
  is non-empty, rejecting expressions without {{ }} delimiters that
  would silently skip every row
- evaluate_skip_when centralizes try/except so both sync and async
  engines get identical fail-safe behavior on eval errors
- evaluate_skip_when takes a single pre-deserialized record; caller
  runs deserialize_json_values once and passes to both skip eval and
  generator (no double deserialization, no redundant parameter)
- Update _should_skip_cell, async _run_cell, Files Modified table,
  and verification section accordingly

Refs #479

Made-with: Cursor

* plan: add get_side_effect_columns accessor to execution graph spec

Document _side_effects_by_producer inverse map and
get_side_effect_columns() accessor on ExecutionGraph, needed by
_write_skip_to_record / apply_skip_to_record to clear __trace,
__reasoning_content, etc. on skip. Added to both Step 2b metadata
section and Files Modified table.

The __skipped__ leak into active_df (greptile's other P1) was already
fixed in 70463789 via strip_skip_metadata_from_records.

Refs #479

Made-with: Cursor

* add skip.when conditional column generation

Introduce SkipConfig on SingleColumnConfig to gate column generation
with a Jinja2 expression. Columns can be skipped by expression or by
upstream propagation (propagate_skip flag).

- SkipConfig: Pydantic model with config-time syntax/delimiter/variable
  validation and cached column extraction from the Jinja2 AST
- skip_evaluator: runtime expression evaluation via NativeSandboxedEnvironment
  with fail-safe error handling (skip on expected failures)
- skip_provenance: centralized __skipped__ record tracking shared by
  sync builder, async scheduler, and buffer managers
- DAG/ExecutionGraph: skip.columns wired as dependency edges in both
  topological sort and static execution graph
- Validation: validate_skip_references checks reference existence,
  sampler/seed scope, and allow_resize conflicts
- Sync builder: cell-by-cell and full-column skip with merge-back
- Async scheduler: cell and batch skip with live-buffer provenance

Made-with: Cursor

* fix review findings for skip.when implementation

- Add skip evaluation to _fan_out_with_async (was missing, causing
  skipped rows to still be sent to the LLM)
- Preserve __skipped__ provenance on non-skipped records after
  full-column generation so multi-hop propagation works
- Use single live-buffer reference in _run_batch skip loop for
  consistency with _run_cell
- Move Template import to TYPE_CHECKING and reorder import blocks
- Replace O(n²) sum() with itertools.chain in dag.py
- Add set_required_columns/set_propagate_skip/set_skip_config
  setters to ExecutionGraph for symmetry with existing API

Made-with: Cursor

* add conditional generation with skip recipe and refactor skip helpers

Add a new recipe demonstrating skip.when patterns (expression gate,
propagation, opt-out) with a customer support ticket pipeline.

Also extract _should_skip_record in async_scheduler, remove the
redundant propagate_skip param from should_skip_by_propagation, and
pass a precomputed all_side_effects set through the DAG sort.

Made-with: Cursor

* updates

* fixes

* remove recipe > inject conditional gen into existing tutorial

* regen colab notebooks

* fix: handle missing execution graph in _column_can_skip

Return False when the graph has not been initialized instead of raising,
since skip logic cannot apply before generators are set up.

Made-with: Cursor

* parametrize some tests

* public before private

* slight refactor for readability

* parametrize some tests

* minor fixes

* reanme internla skip tracker key name

* clarify intent in comment

* when skipped _run_cell should return skipped value even though the consumer doesn't currenlty care about it

* remove inline import

* minor refactor for clarity

* fix: preserve skip metadata across replace_buffer and exclude allow_resize from skip branch

Two bugs in the sequential engine's _run_full_column_generator:

1. replace_buffer(df.to_dict()) erased __internal_skipped_columns in
   three code paths (MultiColumnConfig, non-skip-aware, has_skipped=False
   fallthrough), breaking propagate_skip for downstream columns when an
   independent FULL_COLUMN generator ran between skip-setting and
   propagating columns.

2. _column_can_skip returned True for allow_resize=True columns via
   propagation, causing the skip-aware merge path to raise on the 1:1
   row-count check for 1:N generators.

- Add restore_skip_metadata helper to skip_tracker.py
- Guard _column_can_skip against allow_resize=True columns
- Refactor _run_full_column_generator into three focused methods
- Remove dead allow_resize / _log_resize_if_changed from skip path
- Remove redundant _require_graph() calls in skip helpers
- Add single_column_config_by_name cached property
- Add integration tests for both bugs and unit tests for the helper

Made-with: Cursor

* address review comments on skip.when PR (#502)

- Extract shared skip decision logic (_should_skip_cell / _should_skip_record)
  into should_skip_column_for_record() in skip_evaluator.py so both sync and
  async engines call the same function (andreatgretel review comment)
- Extend SkipConfig self-reference validation to cover side-effect columns
  (e.g. review__trace on the review column) — previously only checked
  self.name, now checks self.name | self.side_effect_columns
- Add async engine integration tests for skip paths: cell-by-cell with
  propagation and full-column batch skip (exercises _run_cell / _run_batch)
- Fix test_allow_resize_column_not_blocked_by_upstream_skip to use default
  propagate_skip=True so it actually exercises the allow_resize guard
- Move get_skipped_column_names from skip_tracker to skip_evaluator (sole
  production consumer)

Made-with: Cursor

* address cr feedback

* Fix issue with full column  generating messing up order of skipped rows

* add skip conditional generation edge case tests

- test_skip_evaluator: parametrized should_skip_column_for_record covering
  propagation, expression gates, short-circuiting, and disabled propagation
- test_execution_graph: skip metadata accessors (get_skip_config,
  should_propagate_skip, get_required_columns, get_side_effect_columns,
  resolve_side_effect, skip.when DAG edges)
- test_dataset_builder: chained transitive propagation (4 levels),
  two independent skip gates, custom skip.value, row count preservation

Made-with: Cursor

* fix: make expression jinja validator private

Rename assert_expression_valid_jinja to _assert_expression_valid_jinja
to match the private naming convention used by other model validators.

Made-with: Cursor

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:31:50 -06:00
Nabin Mulepati
313f347363
chore: simplify tutorial 4 image dataset and use default model config (#403)
* chore: simplify tutorial 4 image dataset and use default model config

Switch from the large ColPali dataset (52 GB) to rokmr/pets (~23 MB)
for faster downloads in the vision tutorial. Use the default
nvidia-vision model alias instead of a custom ModelConfig block.

* regen colab notebooks
2026-03-13 12:26:41 -06:00
Andre Manoel
46358461ee
fix: repair notebook CI (dead model, missing API key, pyarrow type bug) (#348)
* fix: repair notebook CI by replacing dead vision model and adding missing API key

- Replace `meta/llama-4-scout-17b-16e-instruct` (no longer serving on
  build.nvidia.com) with `nvidia/nemotron-nano-12b-v2-vl` (project default)
  in tutorial notebook 4
- Add `OPENROUTER_API_KEY` to the `build-notebooks` workflow so notebooks
  5 and 6 (which use OpenRouter for image generation) can authenticate
- Regenerate colab notebooks to reflect the model change

* fix: handle pyarrow list types in notebook 6 display_image

When image columns are loaded from parquet with pyarrow backend,
list values are pyarrow ListScalars, not Python lists. The
isinstance(x, list) check fails, causing the whole ListScalar to be
treated as a single path string (producing filenames ending in
`png')]`). Use isinstance(x, str) instead to correctly handle any
iterable type.
2026-02-23 13:27:47 -03:00
Nabin Mulepati
8f7a72094a
feat: auto-detect ImageContext format for image-to-image generation (#342)
* updates to support image->image

* update notebooks

* regen colab notebooks

* simplify tests
2026-02-20 15:54:42 -05:00
Nabin Mulepati
d8d1e668b0
docs: add image generation documentation and image-to-image editing tutorial (#319) 2026-02-12 14:38:52 -07:00
Nabin Mulepati
8e2fd3286f
feat: add image generation support with multi-modal context (#317) 2026-02-12 14:00:28 -07:00
Andre Manoel
b6d400ef7d
chore: update tutorial notebooks to use dd. notation consistently (#288)
- Convert notebook 3 from string-based columns to class specs (dd.SamplerColumnConfig, etc.)
- Fix grammar: "is the main object is responsible" → "is the main object responsible"
- Remove stray "A" at end of URL in notebook 2
- Remove empty markdown cell in notebook 4
- Add missing data_designer.validate() call in notebook 4
- Regenerate colab notebooks from source
2026-02-03 12:03:32 -03:00
Kirit Thadaka
de7c3ab99a
docs: add deployment, performance tuning guides and streamline gettin… (#277)
* docs: add deployment, performance tuning guides and streamline getting started

- Add deployment-options.md: Library vs. Microservice decision guide
- Add inference-architecture.md: Separation of concerns with LLM servers
- Add performance-tuning.md: Concurrency and batching optimization guide
- Streamline index.md: Merge installation, add quick example, simplify
- Remove quick-start.md: Content merged into welcome page
- Remove installation.md: Content merged into welcome page
- Update model docs: Add concurrency control sections and cross-references
- Update mkdocs.yml: Add new Architecture section to navigation

* docs: add tasteful emojis to new documentation pages

* docs: consolidate redundant concurrency and troubleshooting content

- Remove duplicate max_parallel_requests tables from model-configs.md and inference-parameters.md
- Remove duplicate Concurrency Control section from model-configs.md
- Simplify Concurrency Control in inference-parameters.md to link to performance-tuning.md
- Remove Troubleshooting section from inference-architecture.md (covered in performance-tuning.md)
- performance-tuning.md is now the authoritative source for tuning guidance

* Simplified doc additions

* Switched default model to nemotron 3 nano

* Addressed feedback

* Added first blog draft
2026-02-02 21:03:58 -08:00
Johnny Greco
ae0665fa16
refactor: slim package refactor into three subpackages (#240)
* remove old structure

* major shuffle

* streamline project configs

* update make commands

* updates to make commands

* remove essentials

* initialize logger in interface

* uv lock

* ignore notepad

* update workflows

* fix e2e project config

* generate colab notebooks

* resolve default model settings in interface

* fix build commands

* update perf import make command

* cleaning up some slop

* update recipes

* move conftest files to tests/

* update subpackage readmes

* streamline config_logging

* use exports

* update perf import usage pattern

* update for IDE behavior with ruff

* remove engine's fixtures file

* add note to about lazy imports

* update dependencies

* update docs

* doc fixes

* uv lock

* updates to catch up with main

* clean up makefile

* remove package gitignores

* define deps only once

* isolate tests

* add test for protetion rule

* create temp dirs for isolated tests

* catch up to main

* update headers

* re apply changes

* better result summaries for isolated tests

* move exports into top-level init

* fix client importlib version syntax

* catch up with main
2026-01-27 13:53:20 -05:00
Mike Knepper
7b5ea13f8b
Fix stray validate calls in notebooks (#192) 2026-01-08 15:46:20 -06:00
Mike Knepper
6bf7698bc2
refactor: Overhaul to seed datasets (#167) 2026-01-08 11:48:14 -06:00
Nabin Mulepati
3b4e296baf
feat: add OpenRouter as one of the default providers (#161)
* Add openrouter as a default provider

* Update docs
2026-01-06 10:22:18 -07:00
Johnny Greco
0a60f869c1
docs: just some tutorial notebook tweaks and a docstring update (#150)
* update doctstring

* notebook tweaks

* generate colab notebooks
2025-12-18 12:01:50 -05:00
Johnny Greco
6e6efc009f
docs: some updates for nano3 (#149)
* some fixes

* generate colab notebooks
2025-12-17 18:24:39 -05:00
Nabin Mulepati
8d4c6c12b4
chore: Update nvidia text default model alias to nano v3 (#133) 2025-12-15 15:03:12 -07:00
Nabin Mulepati
8370e4a00b
feat: support native embedding generation (#106)
* Add generation type to ModelConfig

* pass tests

* added generate_text_embeddings

* tests

* remove sensitive=True old artifact no longer needed

* Slight refactor

* slight refactor

* Added embedding generator

* chunk_separator -> chunk_pattern

* update tests

* rename for consistency

* Restructure InferenceParameters -> CompletionInferenceParameters, BaseInferenceParameters, EmbeddingInferenceParameters

* Remove purpose from consolidated kwargs

* WithModelConfiguration.inference_parameters should should be typed with BaseInferenceParameters

* Type as WithModelGeneration

* Add image generation modality

* update return type for generate_kwargs

* make generation_type a field of ModelConfig as opposed to a prop resolved based on the type of InferenceParameters

* remove regex based chunking from embedding generator

* Remove image generation for now

* more tests and updates

* column_type_is_llm_generated -> column_type_is_model_generated

* change set to list: fix flaky tests

* CompletionInferenceParameters -> ChatCompletionInferenceParameters for consistency with generation_type

* Update docs

* fix deprecation warning originating from cli model settings

* update display of inference parameters in cli list

* save prog on inference parameter

* updates for the ocnfig builder

* update cli readme

* update cli for inference parmeters

* update inference parameter names

* flip order of vars

* WithCompletion -> WithChatCompletion

* specify InferenceParamsT

* Update columns.md with EmbeddingColumnConfig info

* make generation_type a descriminator field in inference params. add configuration support for max_parallel_requests and timeout

* DRY out some stuff in field.py

* Update nomenclature. prompt tokens -> input tokens, completion tokens -> output tokens in column statistics for consistency

* Add nvidia-embedding and openai-embedding to default model configs

* Fix typo in docs

* Make generate collab notebooks

* fine-tune -> adjust
2025-12-15 11:03:33 -07:00
Andre Manoel
68533c78be
docs: fix links on notebooks and add %%capture on install cell (#134) 2025-12-15 14:41:01 -03:00
Andre Manoel
7fa9a413ac
docs: add option to open notebook directly in Colab (#126) 2025-12-12 15:15:26 -03:00
Mike Knepper
32515ba724
style: Sort imports traditionally instead of within sections (#103) 2025-12-08 09:01:58 -06:00
Nabin Mulepati
1de2262b94
docs: add models module to code reference (#101)
* Add example notebook showing how to use image contexts

* change 101 -> tutorial

* update _README.md with info on the new tutorial

* add reference in mkdocs.yml

* simplify vlm tutorial

* update num_records on tutorials. Update .gitignore

* update readme info

* add models module to code reference

* fix links to generated ipynb

* change vlm in example tutorial to llama4-scout
2025-12-05 10:41:43 -07:00
Nabin Mulepati
8ccb724fb3
docs: Add example notebook showing how to use image contexts (#97) 2025-12-04 15:39:58 -07:00
Andre Manoel
6d921c48ba
fix: small typo on text file (#95)
Notebooooks

Also changing from "Jupytext Format" to "`.py` Format"
2025-12-03 18:31:35 -03:00
Nabin Mulepati
8e3080241b
docs: move models docs to concepts > models (#93) 2025-12-03 14:10:01 -07:00
Andre Manoel
60a898181a
fix: add download links to notebooks (#94) 2025-12-03 18:01:57 -03:00
Andre Manoel
5d4ad10b11
chore: moving notebooks to jupytext and cleaning up workflows (#91)
* adding basic jupytext structure

Co-authored-by: Johnny Greco <jogreco@nvidia.com>

* few fixes

* first test for ci

* adding error intentionally to check workflow behavior

* test calling from other workflows

* typo

* trying as job instead

* couple of fixes

* checking path

* trying to fix path

* wrapping up

---------

Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2025-12-03 17:29:07 -03:00