claude-code - 💡(How to fix) Fix [BUG] OTel api_request model attribute drops [1m] suffix while runtime serves 1M context — Cost-by-model dashboard misattributes ~50% of Opus spend

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error Messages/Logs

2.1.148 for VS Code terminal launches (last version where claude_code_cost_usage_USD_total{terminal_type="vscode", model="claude-opus-4-7[1m]"} dominated and the stripped-suffix variant was a rounding error).

Code Example

labels: {model: "claude-opus-4-7", service_version: "2.1.153", terminal_type: "vscode", query_source: "repl_main_thread"}
content: cache_read_tokens=220493, cache_creation_tokens=1174, input_tokens=1, cost_usd=0.132664

---

# Pick any session you ran heavy on Opus 4.7 from VS Code on v2.1.150+
python3 -c "
import json, sys
peak=0
for line in open(sys.argv[1],'rb'):
    try: d=json.loads(line)
    except: continue
    u=(d.get('message') or {}).get('usage') or {}
    tot=(u.get('cache_read_input_tokens') or 0)+(u.get('cache_creation_input_tokens') or 0)+(u.get('input_tokens') or 0)
    if tot>peak: peak=tot
print('peak single-request input tokens:', peak)
# also count model labels
" ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet (closest neighbours mapped in Additional Information — none cover the OTel/metrics surface specifically)
  • This is a single bug report (the telemetry-only angle is the unit — user-facing UI/picker symptoms are referenced as context, not bundled)
  • I am using the latest version of Claude Code (2.1.153, but the bug spans 2.1.145 → 2.1.153 in my data)

What's Wrong?

In claude_code_cost_usage_USD_total and the api_request Loki event emitted by Claude Code's OpenTelemetry exporter, the model attribute drops the [1m] suffix for a large fraction of requests that actually ran on 1M context. The result is that the Anthropic-shipped "Claude Code" Grafana dashboard (claude-code-overview) splits Opus 4.7 spend across two phantom variants — claude-opus-4-7[1m] and claude-opus-4-7 — even though in reality the runtime served every one of those requests on 1M context.

In my 7-day org metrics:

model labelcost
claude-opus-4-7[1m]$595.86 (52.67%)
claude-opus-4-7$530.66 (46.91%)
claude-haiku-4-5-20251001$2.93
claude-sonnet-4-6$1.76

The pie chart suggests almost half my Opus spend was 200k. It wasn't. Hard evidence from the session JSONLs (which use the same stripped label) below.

Evidence: the "200k" sessions ran way over 200k context

For each session, I parsed every assistant message's usage block and computed the peak cache_read_input_tokens + cache_creation_input_tokens + input_tokens in a single request. A model with a 200k context window cannot accept a prompt above 200k — the Anthropic API rejects it. Yet:

session_id (anon)OTel model labelpeak single-request input
session-Aclaude-opus-4-7879,437 tok
session-Bclaude-opus-4-7511,255 tok
session-Cclaude-opus-4-7225,980 tok
session-Dclaude-opus-4-7[1m] (control)177,046 tok

Sessions A, B, C are all labeled as the 200k variant in OTel and in the JSONL, but each one served at least one request that could not have fit in a 200k window. They must have been served on 1M context. The label is wrong.

(Session-D is a control: labeled [1m], peak under 200k — so we can't distinguish it from a true 200k session by tokens alone, but its label is consistent with the runtime.)

Pattern: concentrated in VS Code + v2.1.150-era sessions

Sliced by model × terminal_type × service_version, query_source="main", my org-wide 7-day spend on the "no [1m]" Opus 4.7 label is:

modelversionterminalcost
claude-opus-4-72.1.150vscode$442.54
claude-opus-4-72.1.153vscode$41.94
claude-opus-4-72.1.150kitty$10.54
claude-opus-4-72.1.152kitty$0.19

vs. the same slice labeled [1m]:

modelversionterminalcost
claude-opus-4-7[1m]2.1.145vscode$266.09
claude-opus-4-7[1m]2.1.146vscode$149.97
claude-opus-4-7[1m]2.1.145kitty$24.33
claude-opus-4-7[1m]2.1.150vscode$0.25
claude-opus-4-7[1m]2.1.150kitty$20.92
claude-opus-4-7[1m]2.1.153vscode$14.74
claude-opus-4-7[1m]2.1.153kitty$27.26

Reading: in v2.1.145/146, vscode-launched sessions emitted the suffix correctly. Starting at v2.1.150, vscode sessions almost exclusively emit the stripped label (v2.1.150 vscode [1m] cost = $0.25, no-suffix cost = $442). v2.1.153 partially recovered ($14.74 with suffix vs $41.94 without). kitty is consistently better — almost all kitty cost carries the [1m] suffix across all versions.

So the regression is VS-Code-terminal-shaped and starts around v2.1.149/150.

query_source split confirms it's mainly main-thread interactive use:

modelquery_sourcecost
claude-opus-4-7main$495.18
claude-opus-4-7auxiliary$35.49
claude-opus-4-7subagent$0.17
claude-opus-4-7[1m]main$534.50
claude-opus-4-7[1m]auxiliary$40.63
claude-opus-4-7[1m]subagent$20.50

Not subagent spawn, not aux task — main-thread interactive sessions are the surface where the label gets stripped.

What Should Happen?

The OTel model attribute emitted on api_request events and the dimension on claude_code_cost_usage_USD_total should carry the same string the runtime actually used to call the Anthropic API. If the request went out with model="claude-opus-4-7[1m]", the telemetry should record claude-opus-4-7[1m]. Stripping the [1m] suffix on the telemetry-side codepath while keeping it on the API-side codepath produces silently misleading cost dashboards — including the dashboard Anthropic ships in the Grafana Cloud "Claude Code" integration.

Ideally the attribute would be sourced from the API response's message.model (server-confirmed) rather than from a client-side KY$/AZ-style recomputation. That eliminates the divergence by construction and matches the "served vs requested" distinction #62521 has been asking for on a different surface.

The session .jsonl message.model field has the same problem (a 879k-token request is logged with message.model = "claude-opus-4-7") — fixing OTel and JSONL with the same fix would be ideal.

Error Messages/Logs

No errors. The bug is silent: the request goes out at 1M, the response comes back, the agent works fine, but the telemetry label is wrong.

Direct quote of an api_request Loki line from session-C (v2.1.153, vscode), showing label vs. actual content:

labels: {model: "claude-opus-4-7", service_version: "2.1.153", terminal_type: "vscode", query_source: "repl_main_thread"}
content: cache_read_tokens=220493, cache_creation_tokens=1174, input_tokens=1, cost_usd=0.132664

cache_read_tokens=220493 on a request a 200k model is supposed to have rejected. The model that served it was clearly the 1M variant; the label says otherwise.

Steps to Reproduce

  1. Have Claude Code 2.1.150 → 2.1.153 installed, opted into 1M context (Max plan, no CLAUDE_CODE_DISABLE_1M_CONTEXT, first-party API).
  2. Launch a session from the VS Code integrated terminal (TERM_PROGRAM=vscode).
  3. Use Opus 4.7 for non-trivial work that accumulates ≥220k tokens of context (e.g., cached system prompt + project CLAUDE.md + a few hundred KB of files referenced over a long thread).
  4. Stream cost/api_request events to a Prometheus + Loki backend via OTel (Grafana Cloud integration works fine for this).
  5. Open the Anthropic-shipped "Claude Code" dashboard (claude-code-overview). The Cost-by-model donut will show a claude-opus-4-7 slice alongside the expected claude-opus-4-7[1m] slice.
  6. Pick a session that landed in the no-suffix slice. Open its .jsonl. Grep message.usage. You'll see cache_read+input totals well above 200k, which proves the runtime served 1M even though the label says 200k.

Alternate quick repro (no OTel pipeline needed):

# Pick any session you ran heavy on Opus 4.7 from VS Code on v2.1.150+
python3 -c "
import json, sys
peak=0
for line in open(sys.argv[1],'rb'):
    try: d=json.loads(line)
    except: continue
    u=(d.get('message') or {}).get('usage') or {}
    tot=(u.get('cache_read_input_tokens') or 0)+(u.get('cache_creation_input_tokens') or 0)+(u.get('input_tokens') or 0)
    if tot>peak: peak=tot
print('peak single-request input tokens:', peak)
# also count model labels
" ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl

If peak > 200000 but the JSONL grep for "model" shows only claude-opus-4-7 (no [1m]), this issue is firing.

Claude Model

Opus 4.7 (1M context variant, served correctly; only the telemetry label is wrong)

Is this a regression?

Yes (telemetry side). For VS Code terminal sessions, the [1m] suffix appears in OTel/JSONL labels reliably on v2.1.145–148, then breaks on v2.1.149/150 and remains partially broken through v2.1.153. Non-VSCode terminals (kitty) are not affected at the same rate. The runtime context window itself is not regressed — the API requests still go out at 1M.

Last Working Version

2.1.148 for VS Code terminal launches (last version where claude_code_cost_usage_USD_total{terminal_type="vscode", model="claude-opus-4-7[1m]"} dominated and the stripped-suffix variant was a rounding error).

Claude Code Version

2.1.153 (current). Bug observed across 2.1.150, 2.1.152, 2.1.153 in my own data.

Platform

Anthropic API (direct, first-party — no ANTHROPIC_BASE_URL, no Bedrock, no Vertex).

Operating System

Linux (Arch, kernel 7.0.x-zen).

Terminal/Shell

VS Code integrated terminal (TERM_PROGRAM=vscode), zsh. kitty is not affected at the same rate; sessions launched there carry the [1m] suffix consistently.

Additional Information

Why this matters

The Anthropic-shipped Claude Code Grafana integration dashboard (claude-code-overview) is what organizations on the Team plan use to track Claude Code spend per member, per project, per model. With this bug, the Cost-by-model donut, the Token usage charts, the Avg cost per API call panel — every panel that splits by model — is misleading by a factor that scales with the share of VS-Code-driven sessions in the org. For a heavy VS Code user the misattribution is roughly 50/50.

Concrete impact on my own org dashboard: looks like I'm splitting my Opus spend roughly half between the 200k and 1M variants, with the 200k variant being a slightly cheaper slice on the donut. That's the opposite of true. 100% of my Opus spend went through 1M context. The only reason I caught it was that one session's cache_read_input_tokens peaked at 879k tok in a single request, which is structurally impossible on a 200k model.

Closest related issues

This bug shares root structure with several open/closed reports — client-side model label diverges from server-served model — but the OTel/metrics surface specifically has no existing report:

  • #56508 CLOSED without fix — Sonnet 4.6 selected, JSONL logs every assistant turn as claude-sonnet-4-5-20250929. Same divergence shape, alias axis instead of [1m] axis. JSONL evidence pattern is identical to what I'm reporting here for OTel.
  • #53327 OPEN — Picker shows Opus 4.7 (200K), runtime is 1M, 18% of Max plan burned on one prompt before the user noticed. UI surface of the same root issue.
  • #62521 OPEN — Maps the entire family ("active runtime config not observable from inside agent turn") and references #56508, #53327, #50714, #57804, #44819. Proposes a GetSessionContext() style fix.
  • #60913 OPEN — Opposite failure mode: v2.1.145 sometimes sends literal claude-opus-4-7[1m] as the model ID and the API returns 404, silent fallback to 200k. So somewhere in the model-string handling there's both an over-strip (this issue) and an over-keep (#60913) code path.
  • #61730 CLOSED — Side-panel navigation silently downgrades 1M to 200k. Different mechanism, same observability gap.

The likely shared fix is what #62521 asks for: source the model attribute on observability events (OTel and JSONL alike) from message.model in the API response, not from the client-side requested string. That's the only signal that survives both the strip-the-suffix bug and the resume-state-clobber bugs in #61068 / #61730.

What I'd find useful as a response

  1. Confirmation of the OTel-side mislabeling (separate from the UI-side mislabeling in #53327 — there are at least two places where the suffix gets stripped).
  2. A position on whether OTel model and JSONL message.model will be aligned to the server-confirmed message.model, so the Anthropic-shipped Grafana dashboard becomes truthful again for VS Code users.
  3. Either an Anthropic-side correction backfill for affected windows on the Team plan, or a known-issue banner on the dashboard, so customers don't make capacity / cost decisions off a wrong donut.

Meta-note

Drafted by Claude Code (this very session — Opus 4.7 on 1M, served correctly) after the user reviewed the evidence chain. Cross-checked against gh search issues for OTel / telemetry / metric-label terminology to confirm no existing report covers this specific surface.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING