Skip to content

Track per-run token usage and cost in meta.json#60

Open
RandomOscillations wants to merge 1 commit intomainfrom
feat/token-usage-tracking
Open

Track per-run token usage and cost in meta.json#60
RandomOscillations wants to merge 1 commit intomainfrom
feat/token-usage-tracking

Conversation

@RandomOscillations
Copy link
Collaborator

Summary

  • Thread actual token counts from API responses through provider → facade → reasoning → engine → meta.json
  • Each provider's simple_call_async now returns (dict, TokenUsage) with real token counts from the API response
  • Two-pass reasoning captures tokens from both pivotal (Pass 1) and routine (Pass 2) calls, engine accumulates across chunks and computes estimated USD cost via pricing.py

Changes

  • providers/base.pyTokenUsage dataclass, updated abstract signature
  • providers/openai.py — Extract usage from Responses API (input_tokens/output_tokens) and Chat Completions API (prompt_tokens/completion_tokens)
  • providers/claude.py — Extract usage from response.usage
  • llm.py — Pass-through tuple return from simple_call_async
  • models/simulation.py — Token fields on ReasoningResponse
  • reasoning.pyBatchTokenUsage dataclass, capture tokens in two-pass flow, accumulate in batch_reason_agents
  • engine.py — Running totals, _compute_cost() using pricing.py, write cost block to meta.json

meta.json output

"cost": {
  "pivotal_input_tokens": 1234567,
  "pivotal_output_tokens": 456789,
  "routine_input_tokens": 234567,
  "routine_output_tokens": 89012,
  "total_input_tokens": 1469134,
  "total_output_tokens": 545801,
  "pivotal_model": "gpt-5",
  "routine_model": "gpt-5-mini",
  "estimated_usd": 5.1234
}

Test plan

  • All 618 existing tests pass with updated return types
  • 6 new provider token extraction tests (OpenAI Responses, Chat Completions, Claude, null usage)
  • 3 new engine tests (chunk accumulation, meta.json cost output, unknown model handling)
  • ruff check and ruff format clean
  • Manual: run a small simulation, verify meta.json contains correct cost block

Closes #59

🤖 Generated with Claude Code

Thread actual token counts from API responses through
provider → facade → reasoning → engine → meta.json.

Each provider's simple_call_async now returns (dict, TokenUsage).
Two-pass reasoning captures tokens from both pivotal (Pass 1) and
routine (Pass 2) calls. The engine accumulates totals across chunks
and computes estimated USD cost via pricing.py, writing a cost block
to meta.json with per-pass token breakdowns.

Closes #59

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 8, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track per-run token usage and cost

1 participant