Elgato_dark/DataDesigner

Fork 0

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

Eric W. Tramel c0a4dcbb85

CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions

Details

CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Lint and Format Check (push) Blocked by required conditions

Details

CI / Check License Headers (push) Blocked by required conditions

Details

CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions

Details

CI / Validate dispatched SHA (push) Waiting to run

Details

CI / Test Config (Python 3.10 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Config (Python 3.11 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Config (Python 3.12 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Config (Python 3.13 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Config (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Config (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Config (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Config (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Engine (Python 3.10 on macos-latest) (push) Blocked by required conditions

Details

CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Engine (Python 3.11 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Engine (Python 3.12 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Engine (Python 3.13 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions

Details

CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Coverage Check (Python 3.11) (push) Blocked by required conditions

Details

CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions

Details

CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions

Details

CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions

Details

CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions

Details

CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions

Details

CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions

Details

feat: implement async scheduling admission control (#661 )

2026-05-20 20:58:05 -04:00

5.7 KiB

Raw Permalink Blame History

Models

The model subsystem provides a unified interface for LLM access: chat completions, embeddings, and image generation. It handles client creation, retry, request admission, usage tracking, and MCP tool integration.

Source: packages/data-designer-engine/src/data_designer/engine/models/

Overview

The model subsystem is layered:

ModelRegistry (lazy facade-per-alias)
  └── ModelFacade (completion, embeddings, image gen, MCP tool loops)
        └── ModelRequestExecutor (request admission + provider execution)
              └── ModelClient (OpenAI-compatible or Anthropic adapter)
                    └── RetryTransport (httpx-level retries)

Generators never interact with HTTP clients directly. They request a ModelFacade by alias from the ModelRegistry, which handles lazy construction, request-resource canonicalization, and shared adaptive request admission state.

Key Components

ModelClient (Protocol)

Defines the contract: sync/async chat, embeddings, image generation, supports_* capability checks, close / aclose. Two implementations:

OpenAICompatibleClient — native httpx adapter for OpenAI-compatible endpoints (NIM, vLLM, etc.)
AnthropicClient — native httpx adapter for the Anthropic Messages API

Client Factory

create_model_client routes by provider type to the appropriate adapter. Optionally wraps with:

RetryTransport — httpx-level retries via httpx_retries.RetryTransport. HttpModelClient sets strip_rate_limit_codes=True for the async client and False for the sync client (http_model_client.py), which controls whether 429 responses are eligible for transport-layer retries.
ModelRequestExecutor — maps model-call attempts to request-admission items, acquires request leases, invokes the provider client, and releases the exact lease on every terminal path.

Request Admission

RequestAdmissionController manages provider/model/domain request resources. AdaptiveRequestAdmissionController adds AIMD (Additive Increase, Multiplicative Decrease) adaptation per RequestDomain (chat, embedding, image, healthcheck) under the provider/model static cap.

ModelRequestExecutor wraps each provider call with a request-admission lease and feeds success or rate-limit outcomes back to the controller. RequestResourceResolver owns canonical provider/model/domain identity so aliases that target the same endpoint share request capacity.

When rampup_seconds is configured, ThrottleManager starts new domains at one concurrent request, climbs linearly toward the peak, and aborts to normal AIMD behavior on the first 429.

ModelFacade

The primary interface for generators. Holds a ModelConfig, ModelClient, optional MCPRegistry, and ModelUsageStats.

completion / acompletion — consolidates kwargs from inference params + provider extras, calls the client, tracks usage
embeddings / aembeddings — embedding generation
image_generation / aimage_generation — image generation
MCP tool loops — when a tool config is active, processes tool calls from completions via MCPFacade, feeds results back, and tracks tool usage stats

ModelRegistry

Lazy ModelFacade construction per alias. Registers shared request-admission state across all facades for coordinated provider/model/domain capacity. Provides get_model_usage_stats and log_model_usage for post-build reporting.

Usage Tracking

ModelUsageStats aggregates TokenUsageStats, RequestUsageStats, ToolUsageStats, and ImageUsageStats per model. Tracked on every successful or failed request for cost and performance visibility.

Data Flow

Generator requests a model by alias from ModelRegistry
Registry lazily creates ModelFacade with the appropriate client and request-admission executor
Generator calls completion() with prompt/messages
ModelFacade builds kwargs, calls ModelRequestExecutor
Request admission acquires a provider/model/domain lease, delegates to ModelClient
ModelClient makes the HTTP request through RetryTransport
Response flows back; usage is tracked; if MCP tools are configured, tool calls are executed and results fed back for another completion round

Design Decisions

Facade pattern hides HTTP, retry, request admission, and MCP complexity from generators. Generators see completion() and get back parsed results.
AIMD request admission at the application layer rather than relying solely on HTTP retries. This provides smoother throughput under rate limits: the transport layer still handles many transient failures, while adaptive request admission adjusts concurrency to avoid sustained 429 storms.
429 handling depends on sync vs async HttpModelClient — The async client uses strip_rate_limit_codes=True, so 429s are not retried at the transport layer and rate-limit signals reach ModelRequestExecutor / request admission quickly. The sync client uses strip_rate_limit_codes=False, so 429s may still be retried transparently at the transport layer before surfacing to callers.
Distribution-valued inference parameters (temperature, top_p as UniformDistribution or ManualDistribution) enable controlled randomness across a dataset without per-row config changes.
Lazy facade construction avoids health-checking or connecting to models that are configured but never used in a particular generation run.

Cross-References

System Architecture — where models fit in the stack
Engine Layer — how generators use models
MCP — tool execution integrated into completions
Config Layer — ModelConfig and ModelProvider definitions

5.7 KiB Raw Permalink Blame History

Models

Overview

Key Components

ModelClient (Protocol)

Client Factory

Request Admission

ModelFacade

ModelRegistry

Usage Tracking

Data Flow

Design Decisions

Cross-References

5.7 KiB

Raw Permalink Blame History