- Deterministic/simple queries should use a strict fast-path (single-shot where possible). - Earlier context compaction before extreme token growth. - Hard per-turn budget guardrails (tool depth / API loops) with checkpointed partial response.

hermes - 💡(How to fix) Fix [Performance] Runaway context growth causes very high latency on multi-tool turns

StepCodex · 2026-05-30T03:12:40Z

[hermes] In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn. In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn. # [Performance] Runaway context growth causes very high latency on multi-tool turns ## Summary In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn. ## Observed impact (from local logs) - Total turns analyzed: 181 - P95 turn duration: 586.6s - API calls: 803 - Avg input tokens: 56,986.7 - P95 input tokens: 105,943.6 - Large-context events: 845 ## Expected behavior - Deterministic/simple queries should use a strict fast-path (single-shot where possible). - Earlier context compaction before extreme token growth. - Hard per-turn budget guardrails (tool depth / API loops) with checkpointed partial response. ## Actual behavior - Tool chains and API loops continue with very large contexts. - Turn wall-clock expands to multi-minute in user-facing flows. ## Reproduction pattern 1. Ask operational query that triggers multiple tool calls. 2. Let turn continue without hard stop/checkpoint. 3. Context grows repeatedly; subsequent model calls slow down substantially. ## Related existing discussions (possible overlap) - https://github.com/NousResearch/hermes-agent/issues/23811 - https://github.com/NousResearch/hermes-agent/pull/21470 - https://github.com/NousResearch/hermes-agent/issues/16671 - https://github.com/NousResearch/hermes-agent/issues/6839 ## Notes I understand there are existing compression/fast-path mechanisms, but real-world latency remains severe in my workflow, so filing this as a user-impact regression/perf issue. ## Environment - OS: Windows 11 - Hermes profile: default

hermes2026-05-30 03:12:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn.

Root Cause

In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn.

RAW_BUFFERClick to expand / collapse

[Performance] Runaway context growth causes very high latency on multi-tool turns

Summary

In practical usage, turn latency becomes minutes when context grows large and multi-tool loops continue in the same turn.

Observed impact (from local logs)

Total turns analyzed: 181
P95 turn duration: 586.6s
API calls: 803
Avg input tokens: 56,986.7
P95 input tokens: 105,943.6
Large-context events: 845

Expected behavior

Deterministic/simple queries should use a strict fast-path (single-shot where possible).
Earlier context compaction before extreme token growth.
Hard per-turn budget guardrails (tool depth / API loops) with checkpointed partial response.

Actual behavior

Tool chains and API loops continue with very large contexts.
Turn wall-clock expands to multi-minute in user-facing flows.

Reproduction pattern

Ask operational query that triggers multiple tool calls.
Let turn continue without hard stop/checkpoint.
Context grows repeatedly; subsequent model calls slow down substantially.

Related existing discussions (possible overlap)

Notes

I understand there are existing compression/fast-path mechanisms, but real-world latency remains severe in my workflow, so filing this as a user-impact regression/perf issue.

Environment

OS: Windows 11
Hermes profile: default

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Deterministic/simple queries should use a strict fast-path (single-shot where possible).
Earlier context compaction before extreme token growth.
Hard per-turn budget guardrails (tool depth / API loops) with checkpointed partial response.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Performance] Runaway context growth causes very high latency on multi-tool turns

Recommended Tools

GitHub issue graph ai analysis

Root Cause

[Performance] Runaway context growth causes very high latency on multi-tool turns

Summary

Observed impact (from local logs)

Expected behavior

Actual behavior

Reproduction pattern

Related existing discussions (possible overlap)

Notes

Environment

FAQ

Expected behavior

Still need to ship something?

TRENDING