Commit graph

5 commits

Author SHA1 Message Date
Nabin Mulepati
71997624b3
feat: track reasoning token usage (#670)
Some checks are pending
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
* feat: track reasoning token usage

Capture provider-reported reasoning-token breakdowns alongside output tokens without changing output token totals. Carry the field through model usage aggregation and add coverage for parsing, facade tracking, and deltas.

Refs #665

* fix: show reasoning tokens in usage summary

Include reasoning token counts in the local model usage summary while preserving output and total token semantics. Telemetry remains unchanged.

Refs #665

* fix: estimate missing reasoning token counts

When providers return reasoning content without a numeric usage breakdown, estimate reasoning tokens from that content while preserving provider-reported output and total token counts.

Refs #665

* fix: track reasoning token count source

* fix: simplify reasoning token source

* fix: omit unknown reasoning tokens from logs

* refactor: clarify reasoning token count helpers

* test: move token counting tests

* fix: enforce reasoning token source

* fix: address reasoning usage review
2026-05-18 12:15:31 -06:00
Nabin Mulepati
8e2fd3286f
feat: add image generation support with multi-modal context (#317) 2026-02-12 14:00:28 -07:00
Johnny Greco
f74f25872c
chore: quiet tool call logs and add tool usage statistics (#293)
* add tool usage statistics tracking

- Add ToolUsageStats class with metrics for tool calls, turns, and
  statistical aggregates (mean/stddev per generation)
- Extend ModelUsageStats to include tool_usage tracking
- Update ModelFacade.generate() to track total tool calls and turns
- Update tests with tool_call_count method and new assertions

* silence noisy mcp logs

* log message updates

* add tools enabled info message

* exclude empty tool_usage from usage stats output

* add tool usage summary logging after column generation

- Track tool usage snapshots before/after column processing
- Log mean tool calls per generation for columns with tools enabled
- Add get_tool_usage_snapshot/get_tool_usage_delta methods to ModelRegistry
- Remove unused extra_info parameter from progress_tracker.log_start()
- Add comprehensive tests for ToolUsageStats

* pretty format model usage logs

* reuse stubs and fixtures

* add merge method to ToolUsageStats for accurate stats aggregation

The previous implementation used extend() to combine tool usage stats,
but extend() is designed for single generation data. This caused
incorrect stddev calculations when merging stats from multiple sources.

- Add ToolUsageStats.merge() that properly combines sum-of-squares
- Update ModelUsageStats.extend() to use merge() for tool usage
- Add tests verifying stddev accuracy after merging

* fix tool usage stats missing generations_with_tools count

When tracking tool usage after generation, the ToolUsageStats was
created without setting generations_with_tools, causing the usage
summary to report zeros for calls/gen and turns/gen metrics.

* fix tool usage delta objects returning incorrect stddev values

- Simplify facade API to use tool_usage.extend() directly
- Return NaN for stddev when sum of squares wasn't tracked
- Add docstring to get_tool_usage_delta explaining NaN behavior
- Add comprehensive tests for stddev variance calculation

* fix tool usage delta stddev by including sum of squares in deltas

Convert sum_of_squares_turns and sum_of_squares_calls from private
attributes to public fields, enabling them to be included in delta
calculations. This allows get_tool_usage_delta to return objects that
compute accurate stddev values instead of NaN.

* fix test to use get_tool_usage_snapshot for accurate stddev tracking

The test was manually constructing a ToolUsageStats snapshot without
sum_of_squares fields, causing stddev to be NaN. Now uses the proper
snapshot method that includes all fields needed for delta calculations.

* use nvidia-reasoning by default

* mean -> average in log message

* refactor log indentation to use centralized LOG_INDENT constant

- Add LOG_INDENT constant to logging.py for consistent indentation
- Replace hardcoded "  |-- " strings across all log statements
- Add tool alias and MCP provider info to pre-generation logs
- Improve model usage log format for better consistency
- Update tests to match new log formats

* simplify usage stats dict access in model registry

Remove defensive .get() calls and unnecessary type casts since
the usage statistics dictionary structure is now guaranteed.

* walrus baby

* simplify tool usage tracking and reduce log verbosity

- Remove mean/stddev calculations from ToolUsageStats in favor of simple
  counts and generation ratios
- Add total_generations field to track all tool-enabled generations
- Simplify registry log output to show generations ratio (with_tools/total)
- Remove per-column tool usage snapshot/delta logging from column builder
- Track tool usage for all tool-enabled generations, not just those with calls

* format inference parameters as multi-line log output

- Add get_formatted_params() method to BaseInferenceParams
- Add LOG_DOUBLE_INDENT constant for nested indentation
- Update log_pre_generation() to display each parameter on its own line

* update tests to use LOG_INDENT constants

Align test assertions with the centralized log indentation
constants introduced in the logging module refactor.

* two-space consistency
2026-02-05 10:14:02 -05:00
Johnny Greco
c19f35639f
chore: add publish script and update license headers (#253) 2026-01-28 08:47:34 -05:00
Johnny Greco
ae0665fa16
refactor: slim package refactor into three subpackages (#240)
* remove old structure

* major shuffle

* streamline project configs

* update make commands

* updates to make commands

* remove essentials

* initialize logger in interface

* uv lock

* ignore notepad

* update workflows

* fix e2e project config

* generate colab notebooks

* resolve default model settings in interface

* fix build commands

* update perf import make command

* cleaning up some slop

* update recipes

* move conftest files to tests/

* update subpackage readmes

* streamline config_logging

* use exports

* update perf import usage pattern

* update for IDE behavior with ruff

* remove engine's fixtures file

* add note to about lazy imports

* update dependencies

* update docs

* doc fixes

* uv lock

* updates to catch up with main

* clean up makefile

* remove package gitignores

* define deps only once

* isolate tests

* add test for protetion rule

* create temp dirs for isolated tests

* catch up to main

* update headers

* re apply changes

* better result summaries for isolated tests

* move exports into top-level init

* fix client importlib version syntax

* catch up with main
2026-01-27 13:53:20 -05:00
Renamed from tests/engine/models/test_usage.py (Browse further)