claude-code - 💡(How to fix) Fix [FEATURE] Expose session token usage to MCP servers and hooks for cost attribution [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#49588Fetched 2026-04-17 08:36:54
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×1

Error Message

  1. Manual logging — developer notes token counts from the UI before/after each phase. Tedious, error-prone, doesn't scale.

Root Cause

  • We've spec'd a Claude Code plugin (aidash) that would use this data for local cost tracking, team reporting, and estimator calibration. The plugin currently requires an HTTP proxy because there's no other way to get per-request token counts. If session usage were queryable via MCP, the proxy layer becomes unnecessary — the plugin simplifies to just an MCP server.
  • Multiple teams are likely building similar cost-tracking tools — #47045 describes one. A first-party usage query would prevent N teams from each building their own proxy/OTel infrastructure to extract data Claude Code already has.

Fix Action

Fix / Workaround

The OTel workaround described in #47045 (run a local OTLP receiver, match events by session ID) works but requires running additional infrastructure. For a team of developers each running agentic workflows, asking everyone to configure OTel exporters, run receivers, and parse events is a high barrier compared to "query an MCP tool."

  1. HTTP proxy in front of the API — intercept all API traffic, parse responses for usage. Works but adds a critical-path dependency and failure point for what should be observability.
  2. OTel exporter + local receiver — works per #47045's workaround, but high setup cost per developer.
  3. Manual logging — developer notes token counts from the UI before/after each phase. Tedious, error-prone, doesn't scale.

All three are workarounds for data that Claude Code already has internally.

Code Example

{
  "session_id": "...",
  "input_tokens": 1240000,
  "output_tokens": 380000,
  "cache_read_input_tokens": 890000,
  "cache_creation_input_tokens": 42000,
  "cost_usd": 24.80,
  "model_breakdown": {
    "claude-opus-4-6": { "input": 800000, "output": 300000, "cost": 21.50 },
    "claude-sonnet-4-6": { "input": 440000, "output": 80000, "cost": 3.30 }
  }
}
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing requests and this specific feature hasn't been requested yet
  • This is a single feature request (not multiple features)

Problem Statement

Teams running agentic development workflows (structured multi-phase pipelines like requirements → architecture → implementation → QA) have no way to measure per-phase token consumption programmatically. This blocks three practical needs:

  1. Budget estimation — predicting what a workflow will cost before starting it. We've built an estimator agent that produces structural predictions (map phases to token budgets), but it can't self-calibrate without actuals.
  2. Cost attribution — knowing which workflow phase or project consumed what share of spend. Leadership asks "what did it cost to spec out Feature X?" and the answer today is "we don't know."
  3. Right-sizing workflows — understanding whether expensive phases (e.g., Opus on architecture) are worth the cost premium over cheaper alternatives. Without data, this is guesswork.

The data exists internally — --debug "api" logs totalUsage per API call, and OpenTelemetry events include input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens, and cost_usd (as noted in #47045). But none of this is accessible to MCP servers, hooks, or external tooling in a structured way.

Related Issues

  • #47045 — Requests token usage in SubagentStop hook payload. Same underlying problem, scoped to hook payloads.
  • #47720 — Requests token usage visibility and threshold hooks for Pro/Max plan users. Same underlying problem, scoped to the desktop app.

This request is broader: expose cumulative session token usage as a queryable value that MCP servers and hooks can read at any point during a session — not just at subagent stop or session end.

Proposed Solution

Expose a session usage query — either as a built-in MCP tool or as a field accessible to hooks — that returns the current session's cumulative token consumption:

{
  "session_id": "...",
  "input_tokens": 1240000,
  "output_tokens": 380000,
  "cache_read_input_tokens": 890000,
  "cache_creation_input_tokens": 42000,
  "cost_usd": 24.80,
  "model_breakdown": {
    "claude-opus-4-6": { "input": 800000, "output": 300000, "cost": 21.50 },
    "claude-sonnet-4-6": { "input": 440000, "output": 80000, "cost": 3.30 }
  }
}

Ideal: A built-in MCP resource or tool (e.g., claude://session/usage) that any MCP server can query. This lets workflow orchestrator plugins (like the rigor plugin or similar) log usage at phase boundaries by querying before and after each phase.

Acceptable alternative: Include cumulative usage in all hook payloads (Stop, SubagentStop, and a new periodic/on-demand hook). This is what #47045 requests for SubagentStop specifically.

Minimum viable: Expose a local file or socket that updates with cumulative usage as the session progresses. Even a JSON file at a known path that updates every N seconds would be sufficient — MCP servers can read files.

Why Not Just Use OTel?

The OTel workaround described in #47045 (run a local OTLP receiver, match events by session ID) works but requires running additional infrastructure. For a team of developers each running agentic workflows, asking everyone to configure OTel exporters, run receivers, and parse events is a high barrier compared to "query an MCP tool."

Alternative Solutions

Without this feature, the only options for per-phase cost measurement are:

  1. HTTP proxy in front of the API — intercept all API traffic, parse responses for usage. Works but adds a critical-path dependency and failure point for what should be observability.
  2. OTel exporter + local receiver — works per #47045's workaround, but high setup cost per developer.
  3. Manual logging — developer notes token counts from the UI before/after each phase. Tedious, error-prone, doesn't scale.

All three are workarounds for data that Claude Code already has internally.

Priority

Medium — significant impact on teams managing agentic budgets, but workarounds exist

Feature Category

Hooks and extensibility

Use Case Example

Scenario: agentic workflow cost tracking

  1. A workflow orchestrator MCP server (rigor, or similar) runs a multi-phase pipeline: requirements → architecture → implementation → QA.
  2. At each phase transition, the orchestrator queries claude://session/usage and logs the delta since the last transition.
  3. After the workflow completes, the orchestrator has per-phase token counts: "requirements: 180K tokens, architecture: 290K tokens, implementation: 1.2M tokens, QA: 240K tokens."
  4. An estimator agent compares these actuals against its predictions and adjusts its base cost model for more accurate future estimates.

Without the usage query, step 2 is impossible without a proxy or OTel infrastructure.

Additional Context

  • We've spec'd a Claude Code plugin (aidash) that would use this data for local cost tracking, team reporting, and estimator calibration. The plugin currently requires an HTTP proxy because there's no other way to get per-request token counts. If session usage were queryable via MCP, the proxy layer becomes unnecessary — the plugin simplifies to just an MCP server.
  • Multiple teams are likely building similar cost-tracking tools — #47045 describes one. A first-party usage query would prevent N teams from each building their own proxy/OTel infrastructure to extract data Claude Code already has.

extent analysis

TL;DR

Exposing a session usage query as a built-in MCP tool or a field accessible to hooks can provide a structured way to measure per-phase token consumption programmatically.

Guidance

  • The proposed solution suggests exposing a session usage query, either as a built-in MCP tool (e.g., claude://session/usage) or as a field accessible to hooks, to return the current session's cumulative token consumption.
  • To verify the effectiveness of this solution, teams can use the query to log usage at phase boundaries and compare actuals against predictions to adjust their base cost models.
  • An acceptable alternative is to include cumulative usage in all hook payloads, which would provide a way to measure token consumption without requiring additional infrastructure.
  • The minimum viable solution is to expose a local file or socket that updates with cumulative usage as the session progresses, allowing MCP servers to read the data.

Example

{
  "session_id": "...",
  "input_tokens": 1240000,
  "output_tokens": 380000,
  "cache_read_input_tokens": 890000,
  "cache_creation_input_tokens": 42000,
  "cost_usd": 24.80,
  "model_breakdown": {
    "claude-opus-4-6": { "input": 800000, "output": 300000, "cost": 21.50 },
    "claude-sonnet-4-6": { "input": 440000, "output": 80000, "cost": 3.30 }
  }
}

Notes

The solution relies on the existence of internal data, such as totalUsage per API call and OpenTelemetry events, which are not currently accessible to MCP servers or external tooling in a structured way.

Recommendation

Apply the proposed solution of exposing a session usage query as a built-in MCP tool or a field accessible to hooks, as it provides a structured way to measure per-phase token consumption programmatically and addresses the needs of teams managing agentic budgets.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING