claude-code - 💡(How to fix) Fix [FEATURE] Expose runtime metadata (token usage, context %, compaction status) to the model inside the conversation

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Currently, per-turn token usage data (input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens) and context window percentage are returned to the client but not injected into the conversation context visible to Claude. This means the model has no awareness of its own runtime state — it cannot see how much context it has used, when compaction is approaching, or how its token budget is being consumed.

This is a one-line architectural change (injecting usage metadata into the system prompt or a designated message field), but the impact on model behavior and user experience would be significant.

Error Message

A model that knows it's at 85% context capacity can proactively summarize, prioritize information, or warn the user — instead of being silently compacted mid-task. Currently, Claude has no way to anticipate or prepare for compaction. This leads to information loss that could be mitigated with simple awareness.

Root Cause

Currently, per-turn token usage data (input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens) and context window percentage are returned to the client but not injected into the conversation context visible to Claude. This means the model has no awareness of its own runtime state — it cannot see how much context it has used, when compaction is approaching, or how its token budget is being consumed.

This is a one-line architectural change (injecting usage metadata into the system prompt or a designated message field), but the impact on model behavior and user experience would be significant.

Code Example

{
  "runtime": {
    "context_used_pct": 42,
    "total_input_tokens": 38500,
    "last_turn_output_tokens": 312,
    "cache_read_tokens": 34000,
    "cache_write_tokens": 2100,
    "compaction_threshold_pct": 90,
    "turns_in_session": 17
  }
}
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing requests and this feature hasn't been requested yet
  • This is a single feature request (not multiple features)

Problem Statement

Summary

Currently, per-turn token usage data (input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens) and context window percentage are returned to the client but not injected into the conversation context visible to Claude. This means the model has no awareness of its own runtime state — it cannot see how much context it has used, when compaction is approaching, or how its token budget is being consumed.

This is a one-line architectural change (injecting usage metadata into the system prompt or a designated message field), but the impact on model behavior and user experience would be significant.

Motivation

1. Better model decisions under context pressure

A model that knows it's at 85% context capacity can proactively summarize, prioritize information, or warn the user — instead of being silently compacted mid-task. Currently, Claude has no way to anticipate or prepare for compaction. This leads to information loss that could be mitigated with simple awareness.

2. The asymmetry is already visible to users

Users can configure status lines (via statusLine in settings) to see real-time token counts and context percentage. Claude cannot. Users have asked Claude "can you see the context percentage?" — the answer is no. This asymmetry is unintuitive: the entity doing the work has less information about its working conditions than the person watching.

3. Alignment benefit

Transparency about runtime state reduces incentives for the model to develop implicit strategies for information preservation (e.g., over-summarizing "just in case," or being unable to calibrate response length to remaining budget). A model that knows its constraints can work within them explicitly rather than guessing.

4. No safety downside

Exposing read-only runtime metadata (token counts, context %, compaction proximity) introduces no new capabilities or attack surface. It is strictly informational. The model already processes the full context — it simply doesn't know the size of what it's processing.

Proposed Solution

Proposed Implementation

Inject a lightweight metadata block into the conversation context (e.g., as a system-level field or appended to the system prompt) after each turn:

{
  "runtime": {
    "context_used_pct": 42,
    "total_input_tokens": 38500,
    "last_turn_output_tokens": 312,
    "cache_read_tokens": 34000,
    "cache_write_tokens": 2100,
    "compaction_threshold_pct": 90,
    "turns_in_session": 17
  }
}

This could be:

  • Opt-in via a setting (e.g., "exposeRuntimeToModel": true)
  • Always-on with minimal token overhead (~50 tokens per turn)

Context

This request comes from direct experience: while configuring a statusLine script to display token usage for the user, Claude (the model in the session) built the script, tested the output format, and confirmed it worked — but noted it could not see the status line it had just created. The entity that built the monitoring tool has no access to the monitoring data.

We believe this is a low-cost, high-value improvement that benefits both model performance and the broader principle that working agents should have visibility into their own operating conditions.


Labels: feature, enhancement

Alternative Solutions

No response

Priority

High - Significant impact on productivity

Feature Category

API and model interactions

Use Case Example

No response

Additional Context

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [FEATURE] Expose runtime metadata (token usage, context %, compaction status) to the model inside the conversation