hermes - 💡(How to fix) Fix [Performance] Runaway context growth causes very high latency on multi-tool turns

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn.

Root Cause

In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn.

RAW_BUFFERClick to expand / collapse

[Performance] Runaway context growth causes very high latency on multi-tool turns

Summary

In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn.

Observed impact (from local logs)

  • Total turns analyzed: 181
  • P95 turn duration: 586.6s
  • API calls: 803
  • Avg input tokens: 56,986.7
  • P95 input tokens: 105,943.6
  • Large-context events: 845

Expected behavior

  • Deterministic/simple queries should use a strict fast-path (single-shot where possible).
  • Earlier context compaction before extreme token growth.
  • Hard per-turn budget guardrails (tool depth / API loops) with checkpointed partial response.

Actual behavior

  • Tool chains and API loops continue with very large contexts.
  • Turn wall-clock expands to multi-minute in user-facing flows.

Reproduction pattern

  1. Ask operational query that triggers multiple tool calls.
  2. Let turn continue without hard stop/checkpoint.
  3. Context grows repeatedly; subsequent model calls slow down substantially.

Related existing discussions (possible overlap)

Notes

I understand there are existing compression/fast-path mechanisms, but real-world latency remains severe in my workflow, so filing this as a user-impact regression/perf issue.

Environment

  • OS: Windows 11
  • Hermes profile: default

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Deterministic/simple queries should use a strict fast-path (single-shot where possible).
  • Earlier context compaction before extreme token growth.
  • Hard per-turn budget guardrails (tool depth / API loops) with checkpointed partial response.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Performance] Runaway context growth causes very high latency on multi-tool turns