hermes - 💡(How to fix) Fix Compression failure hangs main response loop indefinitely (HTTP 400 on auxiliary causes silent 88-min stall)

Error Message

errors.log shows the same temperature is deprecated 400 multiple times across the preceding weeks — the underlying failure was recurring but never escalated to a user-visible error.

or surface a hard error to the user

Make provider: auto resolution warn loudly if the main model has a known incompatibility with auxiliary call shape — or better, isolate auxiliary to a known-good default model when auto resolves to a model that has rejected an auxiliary call recently.
Surface a user-visible error in the gateway if a response has been pending more than gateway_timeout_warning seconds — currently nothing fires.

Fix Action

Fix / Workaround

Compression auxiliary call fails with HTTP 400
Retries 3x, logs Session summarization failed after 3 attempts
Outer scheduler re-fires compression every ~5 minutes
Main response loop holds a lock waiting for compression to succeed
No user-visible response is ever produced
88 minutes of Auxiliary auto-detect: using main provider... log entries with no progress
6 follow-up user messages received, batched, never processed
Only resolved by killing the gateway process

Workaround (already applied locally)

Happy to PR the workaround into the default config template if useful.

Code Example

HTTP 400: `temperature` is deprecated for this model.

---

2026-05-12 00:00:46  inbound message: msg='...'
2026-05-12 00:01:31  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:05:44  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:10:47  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:15:50  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
... (repeats every ~5 min for 88 minutes) ...
2026-05-12 01:30:43  [manual restart]

Summary

When the auxiliary compression worker fails repeatedly (HTTP 400 from the provider), the main agent response loop hangs indefinitely instead of falling back gracefully. Real-world impact: an 88-minute silent hang on a Telegram gateway session, with subsequent user messages queued but never processed until manual restart.

Repro

Configure auxiliary.compression.provider: auto in ~/.hermes/config.yaml so compression inherits the main model.
Use a main model whose API rejects a parameter the compression worker sends. In our case, claude-opus-4-7 rejects the hardcoded temperature parameter with:
```
HTTP 400: `temperature` is deprecated for this model.
```
Build up a long-running session past the auto-compression threshold (~85% of context window, ~287K tokens in our case).
Send a new user message that triggers compression.

Expected

Compression fails after retry budget exhausted
A warning is logged
The main response continues with uncompressed history (or skips compression on this turn)
User gets a reply

Actual

Compression auxiliary call fails with HTTP 400
Retries 3x, logs Session summarization failed after 3 attempts
Outer scheduler re-fires compression every ~5 minutes
Main response loop holds a lock waiting for compression to succeed
No user-visible response is ever produced
88 minutes of Auxiliary auto-detect: using main provider... log entries with no progress
6 follow-up user messages received, batched, never processed
Only resolved by killing the gateway process

Evidence (anonymized log excerpt)

2026-05-12 00:00:46  inbound message: msg='...'
2026-05-12 00:01:31  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:05:44  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:10:47  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
2026-05-12 00:15:50  Auxiliary auto-detect: using main provider anthropic (claude-opus-4-7)
... (repeats every ~5 min for 88 minutes) ...
2026-05-12 01:30:43  [manual restart]

errors.log shows the same temperature is deprecated 400 multiple times across the preceding weeks — the underlying failure was recurring but never escalated to a user-visible error.

Workaround (already applied locally)

Pin every auxiliary.* slot in config.yaml to a model that does not reject the hardcoded temperature param (e.g. anthropic/claude-sonnet-4-5). This is the same fix previously needed for auxiliary.vision.

Root cause hypothesis

Two layers:

Auxiliary client sends hardcoded temperature to providers that have deprecated it. The auxiliary call site likely needs a model-capability check before including the param (same class of bug as the prior vision_analyze issue).
Compression failure is not fatal-enough. When compression repeatedly fails, the gateway's response pipeline should:
- degrade to sending uncompressed (or partially compressed) history
- or surface a hard error to the user
- never silently hang

The second behavior is the dangerous one — a config misconfiguration becomes an invisible production outage.

Suggested fixes

Strip temperature from auxiliary requests when the target model is known to reject it (or omit it by default; let model defaults apply).
Add a hard wall-clock timeout on the entire compression workflow (not just the per-call timeout). After N minutes total, abandon compression and proceed.
Make provider: auto resolution warn loudly if the main model has a known incompatibility with auxiliary call shape — or better, isolate auxiliary to a known-good default model when auto resolves to a model that has rejected an auxiliary call recently.
Surface a user-visible error in the gateway if a response has been pending more than gateway_timeout_warning seconds — currently nothing fires.

Environment

hermes-agent: current main (deployed locally)
Gateway: Telegram
Main model: anthropic/claude-opus-4-7
Auxiliary slots: provider: auto (except vision which was already pinned to claude-sonnet-4-5 after a similar prior incident on 2026-04-22)

Happy to PR the workaround into the default config template if useful.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Compression failure hangs main response loop indefinitely (HTTP 400 on auxiliary causes silent 88-min stall)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause hypothesis

Fix Action

Fix / Workaround

Workaround (already applied locally)

Code Example

Summary

Repro

Expected

Actual

Evidence (anonymized log excerpt)

Workaround (already applied locally)

Root cause hypothesis

Suggested fixes

Environment

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Compression failure hangs main response loop indefinitely (HTTP 400 on auxiliary causes silent 88-min stall)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause hypothesis

Fix Action

Fix / Workaround

Workaround (already applied locally)

Code Example

Summary

Repro

Expected

Actual

Evidence (anonymized log excerpt)

Workaround (already applied locally)

Root cause hypothesis

Suggested fixes

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING