hermes - 💡(How to fix) Fix [Bug]: Auxiliary client HTTP timeout defaults to 30s, causing compression to fail with slow local LLMs

Root Cause

When using a local LLM for context compression (via /compress or automatic compression), the compression fails repeatedly with the server reporting "request cancelled after 30s, potentially a client-side timeout." This happens because the OpenAI SDK client used by the auxiliary client module is constructed without a timeout parameter, so it defaults to 30 seconds. For slow local LLMs (e.g., ~40 tokens/sec), generating a compression summary takes longer than 30 seconds, causing the HTTP connection to be terminated before generation completes.

Code Example

Report       https://paste.rs/qTUy5
    agent.log    https://paste.rs/485R6
    gateway.log  https://paste.rs/SYKww

---

LLM server (llama.cpp) output showing the 30-second cutoff pattern:
    
    43.21.227.281 I slot print_timing: id  0 | task 13772 | n_decoded =   4619, tg =  41.41 t/s
    43.24.248.161 I slot print_timing: id  0 | task 13772 | n_decoded =   4763, tg =  41.58 t/s
    43.27.280.280 I slot print_timing: id  0 | task 13772 | n_decoded =   4902, tg =  41.69 t/s
    43.30.318.314 I slot print_timing: id  0 | task 13772 | n_decoded =   5046, tg =  41.83 t/s
    43.30.462.376 I srv  params_from_: Chat format: peg-native
    43.30.570.851 W srv          next: request cancelled after 30s, potentially a client-side timeout; please check your client's code
    43.30.570.874 W srv          stop: cancel task, id_task = 13772

Bug Description

When using a local LLM for context compression (via /compress or automatic compression), the compression fails repeatedly with the server reporting "request cancelled after 30s, potentially a client-side timeout." This happens because the OpenAI SDK client used by the auxiliary client module is constructed without a timeout parameter, so it defaults to 30 seconds. For slow local LLMs (e.g., ~40 tokens/sec), generating a compression summary takes longer than 30 seconds, causing the HTTP connection to be terminated before generation completes.

The issue manifests as the local LLM server stopping generation every 30 seconds, then Hermes retrying, which stops again after exactly 30 seconds, creating an infinite loop.

Note: This issue report was drafted with assistance from Hermes Agent itself.

Steps to Reproduce

1. Configure Hermes to use a local LLM (e.g. for what was used, Qwen3.5-27b Q6_K_L running on a local llama.cpp server) as the main provider
2. Ensure auxiliary.compression.provider is set to auto (default) so it falls back to the main local LLM
3. Run a long enough session to trigger automatic context compression, or manually trigger it with /compress
4. Observe that the compression process fails repeatedly every 30 seconds
5. On the LLM server side, see log messages like: request cancelled after 30s, potentially a client-side timeout; please check your client's code

Expected Behavior

The compression should complete successfully, respecting the auxiliary.compression.timeout value from config.yaml (default 120 seconds), regardless of how long the LLM takes to generate the summary.

Actual Behavior

The compression fails repeatedly every 30 seconds. The local LLM server logs show:

request cancelled after 30s, potentially a client-side timeout; please check your client's code
cancel task, id_task = XXXXX

The Hermes client retries, but hits the same 30-second cutoff each time, creating an infinite loop.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Report       https://paste.rs/qTUy5
    agent.log    https://paste.rs/485R6
    gateway.log  https://paste.rs/SYKww

Operating System

Ubuntu 24.04.4 LTS (kernel 6.8.0-117-generic)

Python Version

3.11.15

Hermes Version

v0.15.1 (2026.5.29)

Additional Logs / Traceback (optional)

LLM server (llama.cpp) output showing the 30-second cutoff pattern:
    
    43.21.227.281 I slot print_timing: id  0 | task 13772 | n_decoded =   4619, tg =  41.41 t/s
    43.24.248.161 I slot print_timing: id  0 | task 13772 | n_decoded =   4763, tg =  41.58 t/s
    43.27.280.280 I slot print_timing: id  0 | task 13772 | n_decoded =   4902, tg =  41.69 t/s
    43.30.318.314 I slot print_timing: id  0 | task 13772 | n_decoded =   5046, tg =  41.83 t/s
    43.30.462.376 I srv  params_from_: Chat format: peg-native
    43.30.570.851 W srv          next: request cancelled after 30s, potentially a client-side timeout; please check your client's code
    43.30.570.874 W srv          stop: cancel task, id_task = 13772

Root Cause Analysis (optional)

The root cause appears to be in agent/auxiliary_client.py. The OpenAI SDK client is constructed without a timeout parameter in multiple locations:
- Line ~1458: OpenAI(api_key=api_key, base_url=base_url, **extra)
- Line ~1495: OpenAI(api_key=api_key, base_url=base_url, **extra)
- Line ~1895: OpenAI(api_key=api_key, base_url=base_url)
- Line ~3173: AsyncOpenAI(**async_kwargs) (async_kwargs lacks timeout)

The OpenAI SDK defaults to a 30-second timeout when none is specified. While auxiliary.compression.timeout in config.yaml is set to a higher value (120s by default), that value is used as an application-level deadline but is not passed to the OpenAI SDK client constructor, so the HTTP-level timeout (30s) fires first.

The call_llm() function in auxiliary_client.py does read the task-specific timeout via _get_task_timeout(task), but this is applied at the request level rather than the client level. The OpenAI SDK's client-level timeout takes precedence.

Proposed Fix (optional)

My Hermes Agent suggested and applied a workaround by adding timeout=600.0 to the OpenAI/AsyncOpenAI client constructors in agent/auxiliary_client.py. This resolved the issue for my setup. I'm sharing this as what worked for me, not as a prescriptive fix.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: Auxiliary client HTTP timeout defaults to 30s, causing compression to fail with slow local LLMs

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING