hermes - 💡(How to fix) Fix [Bug]: Auxiliary client HTTP timeout defaults to 30s, causing compression to fail with slow local LLMs

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Additional Logs / Traceback (optional)

Root Cause

When using a local LLM for context compression (via /compress or automatic compression), the compression fails repeatedly with the server reporting "request cancelled after 30s, potentially a client-side timeout." This happens because the OpenAI SDK client used by the auxiliary client module is constructed without a timeout parameter, so it defaults to 30 seconds. For slow local LLMs (e.g., ~40 tokens/sec), generating a compression summary takes longer than 30 seconds, causing the HTTP connection to be terminated before generation completes.

Fix Action

Fix / Workaround

My Hermes Agent suggested and applied a workaround by adding timeout=600.0 to the OpenAI/AsyncOpenAI client constructors in agent/auxiliary_client.py. This resolved the issue for my setup. I'm sharing this as what worked for me, not as a prescriptive fix.

Code Example

Report       https://paste.rs/qTUy5
    agent.log    https://paste.rs/485R6
    gateway.log  https://paste.rs/SYKww

---

LLM server (llama.cpp) output showing the 30-second cutoff pattern:
    
    43.21.227.281 I slot print_timing: id  0 | task 13772 | n_decoded =   4619, tg =  41.41 t/s
    43.24.248.161 I slot print_timing: id  0 | task 13772 | n_decoded =   4763, tg =  41.58 t/s
    43.27.280.280 I slot print_timing: id  0 | task 13772 | n_decoded =   4902, tg =  41.69 t/s
    43.30.318.314 I slot print_timing: id  0 | task 13772 | n_decoded =   5046, tg =  41.83 t/s
    43.30.462.376 I srv  params_from_: Chat format: peg-native
    43.30.570.851 W srv          next: request cancelled after 30s, potentially a client-side timeout; please check your client's code
    43.30.570.874 W srv          stop: cancel task, id_task = 13772
RAW_BUFFERClick to expand / collapse

Bug Description

When using a local LLM for context compression (via /compress or automatic compression), the compression fails repeatedly with the server reporting "request cancelled after 30s, potentially a client-side timeout." This happens because the OpenAI SDK client used by the auxiliary client module is constructed without a timeout parameter, so it defaults to 30 seconds. For slow local LLMs (e.g., ~40 tokens/sec), generating a compression summary takes longer than 30 seconds, causing the HTTP connection to be terminated before generation completes.

The issue manifests as the local LLM server stopping generation every 30 seconds, then Hermes retrying, which stops again after exactly 30 seconds, creating an infinite loop.

Note: This issue report was drafted with assistance from Hermes Agent itself.

Steps to Reproduce

1. Configure Hermes to use a local LLM (e.g. for what was used, Qwen3.5-27b Q6_K_L running on a local llama.cpp server) as the main provider
2. Ensure auxiliary.compression.provider is set to auto (default) so it falls back to the main local LLM
3. Run a long enough session to trigger automatic context compression, or manually trigger it with /compress
4. Observe that the compression process fails repeatedly every 30 seconds
5. On the LLM server side, see log messages like: request cancelled after 30s, potentially a client-side timeout; please check your client's code

Expected Behavior

The compression should complete successfully, respecting the auxiliary.compression.timeout value from config.yaml (default 120 seconds), regardless of how long the LLM takes to generate the summary.

Actual Behavior

The compression fails repeatedly every 30 seconds. The local LLM server logs show:

request cancelled after 30s, potentially a client-side timeout; please check your client's code
cancel task, id_task = XXXXX

The Hermes client retries, but hits the same 30-second cutoff each time, creating an infinite loop.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Report       https://paste.rs/qTUy5
    agent.log    https://paste.rs/485R6
    gateway.log  https://paste.rs/SYKww

Operating System

Ubuntu 24.04.4 LTS (kernel 6.8.0-117-generic)

Python Version

3.11.15

Hermes Version

v0.15.1 (2026.5.29)

Additional Logs / Traceback (optional)

LLM server (llama.cpp) output showing the 30-second cutoff pattern:
    
    43.21.227.281 I slot print_timing: id  0 | task 13772 | n_decoded =   4619, tg =  41.41 t/s
    43.24.248.161 I slot print_timing: id  0 | task 13772 | n_decoded =   4763, tg =  41.58 t/s
    43.27.280.280 I slot print_timing: id  0 | task 13772 | n_decoded =   4902, tg =  41.69 t/s
    43.30.318.314 I slot print_timing: id  0 | task 13772 | n_decoded =   5046, tg =  41.83 t/s
    43.30.462.376 I srv  params_from_: Chat format: peg-native
    43.30.570.851 W srv          next: request cancelled after 30s, potentially a client-side timeout; please check your client's code
    43.30.570.874 W srv          stop: cancel task, id_task = 13772

Root Cause Analysis (optional)

The root cause appears to be in agent/auxiliary_client.py. The OpenAI SDK client is constructed without a timeout parameter in multiple locations:
- Line ~1458: OpenAI(api_key=api_key, base_url=base_url, **extra)
- Line ~1495: OpenAI(api_key=api_key, base_url=base_url, **extra)
- Line ~1895: OpenAI(api_key=api_key, base_url=base_url)
- Line ~3173: AsyncOpenAI(**async_kwargs) (async_kwargs lacks timeout)

The OpenAI SDK defaults to a 30-second timeout when none is specified. While auxiliary.compression.timeout in config.yaml is set to a higher value (120s by default), that value is used as an application-level deadline but is not passed to the OpenAI SDK client constructor, so the HTTP-level timeout (30s) fires first.

The call_llm() function in auxiliary_client.py does read the task-specific timeout via _get_task_timeout(task), but this is applied at the request level rather than the client level. The OpenAI SDK's client-level timeout takes precedence.

Proposed Fix (optional)

My Hermes Agent suggested and applied a workaround by adding timeout=600.0 to the OpenAI/AsyncOpenAI client constructors in agent/auxiliary_client.py. This resolved the issue for my setup. I'm sharing this as what worked for me, not as a prescriptive fix.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: Auxiliary client HTTP timeout defaults to 30s, causing compression to fail with slow local LLMs