hermes - 💡(How to fix) Fix Compression failure hangs main response loop indefinitely (HTTP 400 on auxiliary causes silent 88-min stall)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When the auxiliary compression worker fails repeatedly (HTTP 400 from the provider), the main agent response loop hangs indefinitely instead of falling back gracefully. Real-world impact: an 88-minute silent hang on a Telegram gateway session, with subsequent user messages queued but never processed until manual restart.

Error Message

errors.log shows the same temperature is deprecated 400 multiple times across the preceding weeks — the underlying failure was recurring but never escalated to a user-visible error.

  • or surface a hard error to the user
  1. Make provider: auto resolution warn loudly if the main model has a known incompatibility with auxiliary call shape — or better, isolate auxiliary to a known-good default model when auto resolves to a model that has rejected an auxiliary call recently.
  2. Surface a user-visible error in the gateway if a response has been pending more than gateway_timeout_warning seconds — currently nothing fires.

Root Cause

Root cause hypothesis

Fix Action

Fix / Workaround

  • Compression auxiliary call fails with HTTP 400
  • Retries 3x, logs Session summarization failed after 3 attempts
  • Outer scheduler re-fires compression every ~5 minutes
  • Main response loop holds a lock waiting for compression to succeed
  • No user-visible response is ever produced
  • 88 minutes of Auxiliary auto-detect: using main provider... log entries with no progress
  • 6 follow-up user messages received, batched, never processed
  • Only resolved by killing the gateway process

Workaround (already applied locally)

Happy to PR the workaround into the default config template if useful.

Code Example

HTTP 400: `temperature` is deprecated for this model.

---

2026-05-12 00:00:46  inbound message: msg='...'
2026-05-12 00:01:31  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:05:44  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:10:47  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:15:50  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
... (repeats every ~5 min for 88 minutes) ...
2026-05-12 01:30:43  [manual restart]
RAW_BUFFERClick to expand / collapse

Summary

When the auxiliary compression worker fails repeatedly (HTTP 400 from the provider), the main agent response loop hangs indefinitely instead of falling back gracefully. Real-world impact: an 88-minute silent hang on a Telegram gateway session, with subsequent user messages queued but never processed until manual restart.

Repro

  1. Configure auxiliary.compression.provider: auto in ~/.hermes/config.yaml so compression inherits the main model.
  2. Use a main model whose API rejects a parameter the compression worker sends. In our case, claude-opus-4-7 rejects the hardcoded temperature parameter with:
    HTTP 400: `temperature` is deprecated for this model.
  3. Build up a long-running session past the auto-compression threshold (~85% of context window, ~287K tokens in our case).
  4. Send a new user message that triggers compression.

Expected

  • Compression fails after retry budget exhausted
  • A warning is logged
  • The main response continues with uncompressed history (or skips compression on this turn)
  • User gets a reply

Actual

  • Compression auxiliary call fails with HTTP 400
  • Retries 3x, logs Session summarization failed after 3 attempts
  • Outer scheduler re-fires compression every ~5 minutes
  • Main response loop holds a lock waiting for compression to succeed
  • No user-visible response is ever produced
  • 88 minutes of Auxiliary auto-detect: using main provider... log entries with no progress
  • 6 follow-up user messages received, batched, never processed
  • Only resolved by killing the gateway process

Evidence (anonymized log excerpt)

2026-05-12 00:00:46  inbound message: msg='...'
2026-05-12 00:01:31  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:05:44  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:10:47  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:15:50  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
... (repeats every ~5 min for 88 minutes) ...
2026-05-12 01:30:43  [manual restart]

errors.log shows the same temperature is deprecated 400 multiple times across the preceding weeks — the underlying failure was recurring but never escalated to a user-visible error.

Workaround (already applied locally)

Pin every auxiliary.* slot in config.yaml to a model that does not reject the hardcoded temperature param (e.g. anthropic/claude-sonnet-4-5). This is the same fix previously needed for auxiliary.vision.

Root cause hypothesis

Two layers:

  1. Auxiliary client sends hardcoded temperature to providers that have deprecated it. The auxiliary call site likely needs a model-capability check before including the param (same class of bug as the prior vision_analyze issue).
  2. Compression failure is not fatal-enough. When compression repeatedly fails, the gateway's response pipeline should:
    • degrade to sending uncompressed (or partially compressed) history
    • or surface a hard error to the user
    • never silently hang

The second behavior is the dangerous one — a config misconfiguration becomes an invisible production outage.

Suggested fixes

  1. Strip temperature from auxiliary requests when the target model is known to reject it (or omit it by default; let model defaults apply).
  2. Add a hard wall-clock timeout on the entire compression workflow (not just the per-call timeout). After N minutes total, abandon compression and proceed.
  3. Make provider: auto resolution warn loudly if the main model has a known incompatibility with auxiliary call shape — or better, isolate auxiliary to a known-good default model when auto resolves to a model that has rejected an auxiliary call recently.
  4. Surface a user-visible error in the gateway if a response has been pending more than gateway_timeout_warning seconds — currently nothing fires.

Environment

  • hermes-agent: current main (deployed locally)
  • Gateway: Telegram
  • Main model: anthropic/claude-opus-4-7
  • Auxiliary slots: provider: auto (except vision which was already pinned to claude-sonnet-4-5 after a similar prior incident on 2026-04-22)

Happy to PR the workaround into the default config template if useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING