claude-code - 💡(How to fix) Fix [BUG] OTel api_request model attribute drops [1m] suffix while runtime serves 1M context — Cost-by-model dashboard misattributes ~50% of Opus spend

claude-code2026-05-29 06:37:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Error Messages/Logs

2.1.148 for VS Code terminal launches (last version where claude_code_cost_usage_USD_total{terminal_type="vscode", model="claude-opus-4-7[1m]"} dominated and the stripped-suffix variant was a rounding error).

Code Example

labels: {model: "claude-opus-4-7", service_version: "2.1.153", terminal_type: "vscode", query_source: "repl_main_thread"}
content: cache_read_tokens=220493, cache_creation_tokens=1174, input_tokens=1, cost_usd=0.132664

---

# Pick any session you ran heavy on Opus 4.7 from VS Code on v2.1.150+
python3 -c "
import json, sys
peak=0
for line in open(sys.argv[1],'rb'):
    try: d=json.loads(line)
    except: continue
    u=(d.get('message') or {}).get('usage') or {}
    tot=(u.get('cache_read_input_tokens') or 0)+(u.get('cache_creation_input_tokens') or 0)+(u.get('input_tokens') or 0)
    if tot>peak: peak=tot
print('peak single-request input tokens:', peak)
# also count model labels
" ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet (closest neighbours mapped in Additional Information — none cover the OTel/metrics surface specifically)
This is a single bug report (the telemetry-only angle is the unit — user-facing UI/picker symptoms are referenced as context, not bundled)
I am using the latest version of Claude Code (2.1.153, but the bug spans 2.1.145 → 2.1.153 in my data)

What's Wrong?

In claude_code_cost_usage_USD_total and the api_request Loki event emitted by Claude Code's OpenTelemetry exporter, the model attribute drops the [1m] suffix for a large fraction of requests that actually ran on 1M context. The result is that the Anthropic-shipped "Claude Code" Grafana dashboard (claude-code-overview) splits Opus 4.7 spend across two phantom variants — claude-opus-4-7[1m] and claude-opus-4-7 — even though in reality the runtime served every one of those requests on 1M context.

In my 7-day org metrics:

model label	cost
`claude-opus-4-7[1m]`	$595.86 (52.67%)
`claude-opus-4-7`	$530.66 (46.91%)
`claude-haiku-4-5-20251001`	$2.93
`claude-sonnet-4-6`	$1.76

The pie chart suggests almost half my Opus spend was 200k. It wasn't. Hard evidence from the session JSONLs (which use the same stripped label) below.

Evidence: the "200k" sessions ran way over 200k context

For each session, I parsed every assistant message's usage block and computed the peak cache_read_input_tokens + cache_creation_input_tokens + input_tokens in a single request. A model with a 200k context window cannot accept a prompt above 200k — the Anthropic API rejects it. Yet:

session_id (anon)	OTel `model` label	peak single-request input
session-A	`claude-opus-4-7`	879,437 tok
session-B	`claude-opus-4-7`	511,255 tok
session-C	`claude-opus-4-7`	225,980 tok
session-D	`claude-opus-4-7[1m]` (control)	177,046 tok

Sessions A, B, C are all labeled as the 200k variant in OTel and in the JSONL, but each one served at least one request that could not have fit in a 200k window. They must have been served on 1M context. The label is wrong.

(Session-D is a control: labeled [1m], peak under 200k — so we can't distinguish it from a true 200k session by tokens alone, but its label is consistent with the runtime.)

Pattern: concentrated in VS Code + v2.1.150-era sessions

Sliced by model × terminal_type × service_version, query_source="main", my org-wide 7-day spend on the "no [1m]" Opus 4.7 label is:

model	version	terminal	cost
`claude-opus-4-7`	2.1.150	vscode	$442.54
`claude-opus-4-7`	2.1.153	vscode	$41.94
`claude-opus-4-7`	2.1.150	kitty	$10.54
`claude-opus-4-7`	2.1.152	kitty	$0.19

vs. the same slice labeled [1m]:

model	version	terminal	cost
`claude-opus-4-7[1m]`	2.1.145	vscode	$266.09
`claude-opus-4-7[1m]`	2.1.146	vscode	$149.97
`claude-opus-4-7[1m]`	2.1.145	kitty	$24.33
`claude-opus-4-7[1m]`	2.1.150	vscode	$0.25
`claude-opus-4-7[1m]`	2.1.150	kitty	$20.92
`claude-opus-4-7[1m]`	2.1.153	vscode	$14.74
`claude-opus-4-7[1m]`	2.1.153	kitty	$27.26

Reading: in v2.1.145/146, vscode-launched sessions emitted the suffix correctly. Starting at v2.1.150, vscode sessions almost exclusively emit the stripped label (v2.1.150 vscode [1m] cost = $0.25, no-suffix cost = $442). v2.1.153 partially recovered ($14.74 with suffix vs $41.94 without). kitty is consistently better — almost all kitty cost carries the [1m] suffix across all versions.

So the regression is VS-Code-terminal-shaped and starts around v2.1.149/150.

query_source split confirms it's mainly main-thread interactive use:

model	query_source	cost
`claude-opus-4-7`	main	$495.18
`claude-opus-4-7`	auxiliary	$35.49
`claude-opus-4-7`	subagent	$0.17
`claude-opus-4-7[1m]`	main	$534.50
`claude-opus-4-7[1m]`	auxiliary	$40.63
`claude-opus-4-7[1m]`	subagent	$20.50

Not subagent spawn, not aux task — main-thread interactive sessions are the surface where the label gets stripped.

What Should Happen?

The OTel model attribute emitted on api_request events and the dimension on claude_code_cost_usage_USD_total should carry the same string the runtime actually used to call the Anthropic API. If the request went out with model="claude-opus-4-7[1m]", the telemetry should record claude-opus-4-7[1m]. Stripping the [1m] suffix on the telemetry-side codepath while keeping it on the API-side codepath produces silently misleading cost dashboards — including the dashboard Anthropic ships in the Grafana Cloud "Claude Code" integration.

Ideally the attribute would be sourced from the API response's message.model (server-confirmed) rather than from a client-side KY$/AZ-style recomputation. That eliminates the divergence by construction and matches the "served vs requested" distinction #62521 has been asking for on a different surface.

The session .jsonl message.model field has the same problem (a 879k-token request is logged with message.model = "claude-opus-4-7") — fixing OTel and JSONL with the same fix would be ideal.

Error Messages/Logs

No errors. The bug is silent: the request goes out at 1M, the response comes back, the agent works fine, but the telemetry label is wrong.

Direct quote of an api_request Loki line from session-C (v2.1.153, vscode), showing label vs. actual content:

labels: {model: "claude-opus-4-7", service_version: "2.1.153", terminal_type: "vscode", query_source: "repl_main_thread"}
content: cache_read_tokens=220493, cache_creation_tokens=1174, input_tokens=1, cost_usd=0.132664

cache_read_tokens=220493 on a request a 200k model is supposed to have rejected. The model that served it was clearly the 1M variant; the label says otherwise.

Steps to Reproduce

Have Claude Code 2.1.150 → 2.1.153 installed, opted into 1M context (Max plan, no CLAUDE_CODE_DISABLE_1M_CONTEXT, first-party API).
Launch a session from the VS Code integrated terminal (TERM_PROGRAM=vscode).
Use Opus 4.7 for non-trivial work that accumulates ≥220k tokens of context (e.g., cached system prompt + project CLAUDE.md + a few hundred KB of files referenced over a long thread).
Stream cost/api_request events to a Prometheus + Loki backend via OTel (Grafana Cloud integration works fine for this).
Open the Anthropic-shipped "Claude Code" dashboard (claude-code-overview). The Cost-by-model donut will show a claude-opus-4-7 slice alongside the expected claude-opus-4-7[1m] slice.
Pick a session that landed in the no-suffix slice. Open its .jsonl. Grep message.usage. You'll see cache_read+input totals well above 200k, which proves the runtime served 1M even though the label says 200k.

Alternate quick repro (no OTel pipeline needed):

# Pick any session you ran heavy on Opus 4.7 from VS Code on v2.1.150+
python3 -c "
import json, sys
peak=0
for line in open(sys.argv[1],'rb'):
    try: d=json.loads(line)
    except: continue
    u=(d.get('message') or {}).get('usage') or {}
    tot=(u.get('cache_read_input_tokens') or 0)+(u.get('cache_creation_input_tokens') or 0)+(u.get('input_tokens') or 0)
    if tot>peak: peak=tot
print('peak single-request input tokens:', peak)
# also count model labels
" ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl

If peak > 200000 but the JSONL grep for "model" shows only claude-opus-4-7 (no [1m]), this issue is firing.

Claude Model

Opus 4.7 (1M context variant, served correctly; only the telemetry label is wrong)

Is this a regression?

Yes (telemetry side). For VS Code terminal sessions, the [1m] suffix appears in OTel/JSONL labels reliably on v2.1.145–148, then breaks on v2.1.149/150 and remains partially broken through v2.1.153. Non-VSCode terminals (kitty) are not affected at the same rate. The runtime context window itself is not regressed — the API requests still go out at 1M.

Last Working Version

Claude Code Version

2.1.153 (current). Bug observed across 2.1.150, 2.1.152, 2.1.153 in my own data.

Platform

Anthropic API (direct, first-party — no ANTHROPIC_BASE_URL, no Bedrock, no Vertex).

Operating System

Linux (Arch, kernel 7.0.x-zen).

Terminal/Shell

VS Code integrated terminal (TERM_PROGRAM=vscode), zsh. kitty is not affected at the same rate; sessions launched there carry the [1m] suffix consistently.

Additional Information

Why this matters

The Anthropic-shipped Claude Code Grafana integration dashboard (claude-code-overview) is what organizations on the Team plan use to track Claude Code spend per member, per project, per model. With this bug, the Cost-by-model donut, the Token usage charts, the Avg cost per API call panel — every panel that splits by model — is misleading by a factor that scales with the share of VS-Code-driven sessions in the org. For a heavy VS Code user the misattribution is roughly 50/50.

Concrete impact on my own org dashboard: looks like I'm splitting my Opus spend roughly half between the 200k and 1M variants, with the 200k variant being a slightly cheaper slice on the donut. That's the opposite of true. 100% of my Opus spend went through 1M context. The only reason I caught it was that one session's cache_read_input_tokens peaked at 879k tok in a single request, which is structurally impossible on a 200k model.

Closest related issues

This bug shares root structure with several open/closed reports — client-side model label diverges from server-served model — but the OTel/metrics surface specifically has no existing report:

#56508 CLOSED without fix — Sonnet 4.6 selected, JSONL logs every assistant turn as claude-sonnet-4-5-20250929. Same divergence shape, alias axis instead of [1m] axis. JSONL evidence pattern is identical to what I'm reporting here for OTel.
#53327 OPEN — Picker shows Opus 4.7 (200K), runtime is 1M, 18% of Max plan burned on one prompt before the user noticed. UI surface of the same root issue.
#62521 OPEN — Maps the entire family ("active runtime config not observable from inside agent turn") and references #56508, #53327, #50714, #57804, #44819. Proposes a GetSessionContext() style fix.
#60913 OPEN — Opposite failure mode: v2.1.145 sometimes sends literal claude-opus-4-7[1m] as the model ID and the API returns 404, silent fallback to 200k. So somewhere in the model-string handling there's both an over-strip (this issue) and an over-keep (#60913) code path.
#61730 CLOSED — Side-panel navigation silently downgrades 1M to 200k. Different mechanism, same observability gap.

The likely shared fix is what #62521 asks for: source the model attribute on observability events (OTel and JSONL alike) from message.model in the API response, not from the client-side requested string. That's the only signal that survives both the strip-the-suffix bug and the resume-state-clobber bugs in #61068 / #61730.

What I'd find useful as a response

Confirmation of the OTel-side mislabeling (separate from the UI-side mislabeling in #53327 — there are at least two places where the suffix gets stripped).
A position on whether OTel model and JSONL message.model will be aligned to the server-confirmed message.model, so the Anthropic-shipped Grafana dashboard becomes truthful again for VS Code users.
Either an Anthropic-side correction backfill for affected windows on the Team plan, or a known-issue banner on the dashboard, so customers don't make capacity / cost decisions off a wrong donut.

Meta-note

Drafted by Claude Code (this very session — Opus 4.7 on 1M, served correctly) after the user reviewed the evidence chain. Cross-checked against gh search issues for OTel / telemetry / metric-label terminology to confirm no existing report covers this specific surface.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] OTel api_request model attribute drops [1m] suffix while runtime serves 1M context — Cost-by-model dashboard misattributes ~50% of Opus spend

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

Evidence: the "200k" sessions ran way over 200k context

Pattern: concentrated in VS Code + v2.1.150-era sessions

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

Why this matters

Closest related issues

What I'd find useful as a response

Meta-note

Still need to ship something?

TRENDING