hermes - ✅(Solved) Fix Expose real prompt context usage and compaction metadata in API run events [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15618Fetched 2026-04-26 05:26:10
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
1
Participants
Timeline (top)
labeled ×4cross-referenced ×2

Fix Action

Fixed

PR fix notes

PR #210: fix: make context meter compact-aware

Description (problem / solution / changelog)

Summary

  • Stop using cumulative billing inputTokens + outputTokens as the chat context-limit meter.
  • Estimate the active context from the loaded active-session messages, preferring Hermes token_count when present.
  • Hide the meter while no messages are loaded instead of falling back to stale/cumulative billing usage.

Diagnosis

Hermes compactification makes billing usage and active context diverge. Billing usage continues to grow across model calls and compacted continuations, while the prompt sent to the model has been compacted. WUI was displaying billing usage as if it were current context size, causing false over-limit/negative-remaining indicators after compaction.

For session 20260424_135312_afbd77, Hermes export shows the original session ended with end_reason: compression and continuation children (20260424_135850_df5057, 20260424_135850_737f0b, 20260424_140429_10062c). That confirms this is a compacted-session path, not just a model context-length metadata issue.

Tests

  • npm test -- --run tests/client/context-meter.test.ts tests/client/chat-store.test.ts
  • npm run build

Review

  • Static security scan of added lines: no findings.
  • Independent blocker-only review: passed, no security concerns or logic errors.

Changed files

  • packages/client/src/components/hermes/chat/ChatInput.vue (modified, +7/-10)
  • packages/client/src/stores/hermes/chat.ts (modified, +2/-0)
  • packages/client/src/utils/context-meter.ts (added, +36/-0)
  • tests/client/context-meter.test.ts (added, +29/-0)

Code Example

{
  "event": "run.completed",
  "run_id": "...",
  "output": "...",
  "usage": {
    "input_tokens": 12345,
    "output_tokens": 678,
    "total_tokens": 13023,

    "context_tokens": 45678,
    "context_length": 200000,
    "compression_count": 1,
    "context_source": "provider_prompt_tokens"
  },
  "session_id": "current-effective-session-id",
  "previous_session_id": "optional-previous-session-id",
  "compressed": true
}
RAW_BUFFERClick to expand / collapse

Feature request

Please expose the current prompt-context token usage through the API/SSE run lifecycle, so downstream clients can display an accurate context meter without estimating from visible transcript text or cumulative billing usage.

Motivation

Downstream UIs currently only have imperfect signals:

  1. Cumulative billing usage (input_tokens + output_tokens) is not the same as current context usage.

    • It keeps growing across turns.
    • It includes output/reasoning-side usage.
    • It does not shrink when a session is compacted.
  2. Client-side transcript estimation is also wrong.

    • It cannot see the full prompt payload: system prompt, memory, skills, tool schemas, hidden provider formatting, etc.
    • It cannot reliably account for tool arguments/results or provider tokenizer differences.
    • After automatic compaction, a UI may keep estimating from stale visible transcript data and show the session as over-limit even though Hermes has compacted it.

Hermes Agent already appears to maintain the more accurate internal value through the context compressor (last_prompt_tokens) and uses it for compression decisions/status displays. Exposing that value would let thin clients show honest context state without duplicating token accounting.

Proposed API shape

Add optional context fields to the run.completed event, preferably inside usage for backward compatibility:

{
  "event": "run.completed",
  "run_id": "...",
  "output": "...",
  "usage": {
    "input_tokens": 12345,
    "output_tokens": 678,
    "total_tokens": 13023,

    "context_tokens": 45678,
    "context_length": 200000,
    "compression_count": 1,
    "context_source": "provider_prompt_tokens"
  },
  "session_id": "current-effective-session-id",
  "previous_session_id": "optional-previous-session-id",
  "compressed": true
}

Field notes:

  • context_tokens: current/effective prompt tokens loaded for the active session, preferably the provider-reported prompt token count used by Hermes' context compressor.
  • context_length: model context length Hermes resolved for this run.
  • compression_count: number of compactions in this run/session if available.
  • context_source: e.g. provider_prompt_tokens, rough_estimate, or unknown.
  • session_id: effective session id after any automatic compaction/session split.
  • previous_session_id / compressed: optional metadata so web clients can reload or switch to the continuation session immediately after compaction.

Acceptance criteria

  • run.completed exposes current prompt-context usage separately from cumulative billing usage.
  • Values are optional/backward-compatible for providers that do not return usage.
  • After automatic compaction/session split, API clients can discover the effective continuation session id and updated context usage without waiting for the next user turn.
  • Documentation clarifies the difference between:
    • billing/session usage (input_tokens, output_tokens, cost accounting)
    • current prompt context usage (context_tokens / last_prompt_tokens)

Alternatives considered

Client-side text/token estimation

Rejected. It cannot see hidden prompt components such as system prompt, memory, skills, tool schemas, provider formatting, or exact tokenizer behavior. It also becomes stale after Hermes compacts a session.

Using cumulative input_tokens + output_tokens

Rejected. This is billing/session usage, not current context usage. It grows monotonically and does not represent remaining context after compaction.

Downstream use case

Hermes Web UI wants to display a compact context meter. It should consume Hermes Agent's reported prompt-context usage rather than estimating locally. This would prevent misleading “remaining context” displays and make the UI compaction-aware.

extent analysis

TL;DR

Expose the current prompt-context token usage through the API/SSE run lifecycle by adding optional context fields to the run.completed event.

Guidance

  • Add context_tokens, context_length, compression_count, and context_source fields to the usage object in the run.completed event to provide accurate context usage information.
  • Ensure these fields are optional and backward-compatible for providers that do not return usage.
  • Update documentation to clarify the difference between billing/session usage and current prompt context usage.
  • Consider implementing a mechanism for clients to discover the effective continuation session ID and updated context usage after automatic compaction/session split.

Example

{
  "event": "run.completed",
  "run_id": "...",
  "output": "...",
  "usage": {
    "input_tokens": 12345,
    "output_tokens": 678,
    "total_tokens": 13023,
    "context_tokens": 45678,
    "context_length": 200000,
    "compression_count": 1,
    "context_source": "provider_prompt_tokens"
  },
  "session_id": "current-effective-session-id",
  "previous_session_id": "optional-previous-session-id",
  "compressed": true
}

Notes

The proposed API shape should be implemented to provide accurate context usage information to downstream clients, allowing them to display a compact context meter without estimating locally.

Recommendation

Apply the proposed API shape by adding the optional context fields to the run.completed event, as this will provide the necessary information for clients to display an accurate context meter.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Expose real prompt context usage and compaction metadata in API run events [1 pull requests, 1 participants]