hermes - ✅(Solved) Fix Limit auxiliary LLM concurrency and retry amplification [2 pull requests, 1 participants]

INONONO66 · 2026-05-10T17:07:54Z

[hermes] Auxiliary LLM tasks such as title generation and context compression can amplify transient provider failures. Each active session can spawn background… Auxiliary LLM tasks such as title generation and context compression can amplify transient provider failures. Each active session can spawn background daemon work, and those calls may retry/fallback independently. During OpenAI/Codex instability this can create bursts of requests that trip downstream proxy breakers and affect main user-facing requests. # PR #23346: refactor(cron): DRY media extensions from gateway.platforms.base - Repository: NousResearch/hermes-agent - Author: mehmetkr-31 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/23346 ## Description (problem / solution / changelog) Moves _VIDEO_EXTS and _IMAGE_EXTS to the canonical location in gateway/platforms/base.py and imports them from cron/scheduler.py instead of duplicating. This prevents drift when new formats are added (e.g. .heic images) and removes the "keep in sync" comment hazard. Related to #23324. ## Changed files - `cron/scheduler.py` (modified, +3/-4) - `gateway/platforms/base.py` (modified, +3/-3) - `scripts/release.py` (modified, +1/-0) --- # PR #23510: fix(agent): cap auxiliary LLM concurrency per task - Repository: NousResearch/hermes-agent - Author: konsisumer - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/23510 ## Description (problem / solution / changelog) Cap concurrent in-flight auxiliary LLM calls per task to prevent retry/fallback storms from amplifying provider incidents. ## What changed and why - `agent.auxiliary_client.call_llm` and `async_call_llm` now read `auxiliary. .max_concurrency` from config and acquire a per-task semaphore around the entire call (including retries and provider fallback). Background work like title generation and context compression can spawn one call per active session — during provider degradation those calls fan out across retries and trip downstream rate limits, so the cap stops the amplification. - Sync limits use `threading.BoundedSemaphore`; async limits use `asyncio.Semaphore` keyed by `(task, event-loop)` so each loop gets its own. Semaphores are rebuilt when the configured limit changes, so live config edits take effect. - Default behavior is unchanged (no limit) so existing setups don't regress. The session-search task already has its own concurrency cap inside `tools/session_search_tool.py`; leaving that untouched. - `cli-config.yaml.example` and `website/docs/user-guide/configuration.md` document the knob with recommended values for `title_generation` and `compression`. ## How to test - `pytest tests/agent/test_auxiliary_concurrency.py -q` — the new tests cover: config parsing (invalid / zero / negative / missing values), semaphore caching + rebuild on limit change, sync concurrency enforcement under 6 concurrent threads with limit=2, async enforcement under 6 concurrent tasks with limit=2, and semaphore release on exception so the cap doesn't deadlock after failures. - `pytest tests/agent/test_auxiliary_client.py -q` — full existing auxiliary suite still passes (two unrelated Codex-timeout tests are pre-existing flakes; verified they fail on `main` without these changes too). - Manual: set `auxiliary.title_generation.max_concurrency: 2` in `~/.hermes/cli-config.yaml`, open multiple sessions that all trigger first-exchange title generation, and watch the provider logs — only two title calls run at a time. ## What platforms tested on - macOS on darwin-arm64 (local) Fixes #23324 ## Changed files - `agent/auxiliary_client.py` (modified, +160/-0) - `cli-config.yaml.example` (modified, +19/-0) - `tests/agent/test_auxiliary_concurrency.py` (added, +288/-0) - `website/docs/user-guide/configuration.md` (modified, +30/-0) ## Fixed - Fixed by PR: refactor(cron): DRY media extensions from gateway.platforms.base (https://github.com/NousResearch/hermes-agent/pull/23346) - Fixed by PR: fix(agent): cap auxiliary LLM concurrency per task (https://github.com/NousResearch/hermes-agent/pull/23510) ## Summary Auxiliary LLM tasks such as title generation and context compression can amplify transient provider failures. Each active session can spawn background daemon work, and those calls may retry/fallback independently. During OpenAI/Codex instability this can create bursts of requests that trip downstream proxy breakers and affect main user-facing requests. ## Evidence - `agent/title_generator.py` spawns background title generation after the first exchange. - `agent/auxiliary_client.py` centralizes auxiliary LLM calls and can retry/fallback across providers for payment, rate-limit, auth, and connection failures. - No global semaphore/queue was found for auxiliary LLM calls, so concurrent Discord/session activity can multiply retries. ## Proposed changes - Add global/per-task concurrency limits fo

hermes2026-05-10 17:07:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#23324•Fetched 2026-05-11 03:29:59

View on GitHub

Comments

Participants

Timeline

Reactions

Author

INONONO66

Participants

INONONO66

Timeline (top)

cross-referenced ×3labeled ×3

Auxiliary LLM tasks such as title generation and context compression can amplify transient provider failures. Each active session can spawn background daemon work, and those calls may retry/fallback independently. During OpenAI/Codex instability this can create bursts of requests that trip downstream proxy breakers and affect main user-facing requests.

Root Cause

Fix Action

Fixed

Fixed by PR: refactor(cron): DRY media extensions from gateway.platforms.base (https://github.com/NousResearch/hermes-agent/pull/23346)
Fixed by PR: fix(agent): cap auxiliary LLM concurrency per task (https://github.com/NousResearch/hermes-agent/pull/23510)

PR fix notes

PR #23346: refactor(cron): DRY media extensions from gateway.platforms.base

Repository: NousResearch/hermes-agent
Author: mehmetkr-31
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/23346

Description (problem / solution / changelog)

Moves _VIDEO_EXTS and _IMAGE_EXTS to the canonical location in gateway/platforms/base.py and imports them from cron/scheduler.py instead of duplicating. This prevents drift when new formats are added (e.g. .heic images) and removes the "keep in sync" comment hazard.

Related to #23324.

Changed files

cron/scheduler.py (modified, +3/-4)
gateway/platforms/base.py (modified, +3/-3)
scripts/release.py (modified, +1/-0)

PR #23510: fix(agent): cap auxiliary LLM concurrency per task

Repository: NousResearch/hermes-agent
Author: konsisumer
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/23510

Description (problem / solution / changelog)

Cap concurrent in-flight auxiliary LLM calls per task to prevent retry/fallback storms from amplifying provider incidents.

What changed and why

agent.auxiliary_client.call_llm and async_call_llm now read auxiliary.<task>.max_concurrency from config and acquire a per-task semaphore around the entire call (including retries and provider fallback). Background work like title generation and context compression can spawn one call per active session — during provider degradation those calls fan out across retries and trip downstream rate limits, so the cap stops the amplification.
Sync limits use threading.BoundedSemaphore; async limits use asyncio.Semaphore keyed by (task, event-loop) so each loop gets its own. Semaphores are rebuilt when the configured limit changes, so live config edits take effect.
Default behavior is unchanged (no limit) so existing setups don't regress. The session-search task already has its own concurrency cap inside tools/session_search_tool.py; leaving that untouched.
cli-config.yaml.example and website/docs/user-guide/configuration.md document the knob with recommended values for title_generation and compression.

How to test

pytest tests/agent/test_auxiliary_concurrency.py -q — the new tests cover: config parsing (invalid / zero / negative / missing values), semaphore caching + rebuild on limit change, sync concurrency enforcement under 6 concurrent threads with limit=2, async enforcement under 6 concurrent tasks with limit=2, and semaphore release on exception so the cap doesn't deadlock after failures.
pytest tests/agent/test_auxiliary_client.py -q — full existing auxiliary suite still passes (two unrelated Codex-timeout tests are pre-existing flakes; verified they fail on main without these changes too).
Manual: set auxiliary.title_generation.max_concurrency: 2 in ~/.hermes/cli-config.yaml, open multiple sessions that all trigger first-exchange title generation, and watch the provider logs — only two title calls run at a time.

What platforms tested on

macOS on darwin-arm64 (local)

Fixes #23324

Changed files

agent/auxiliary_client.py (modified, +160/-0)
cli-config.yaml.example (modified, +19/-0)
tests/agent/test_auxiliary_concurrency.py (added, +288/-0)
website/docs/user-guide/configuration.md (modified, +30/-0)

RAW_BUFFERClick to expand / collapse

Summary

Evidence

agent/title_generator.py spawns background title generation after the first exchange.
agent/auxiliary_client.py centralizes auxiliary LLM calls and can retry/fallback across providers for payment, rate-limit, auth, and connection failures.
No global semaphore/queue was found for auxiliary LLM calls, so concurrent Discord/session activity can multiply retries.

Proposed changes

Add global/per-task concurrency limits for auxiliary LLM calls, starting with title generation and context compression.
Add backoff/jitter and respect upstream Retry-After when present.
Allow auxiliary tasks to use a dedicated provider/model separate from main interactive agent traffic.
Add config to disable or fail-fast auxiliary tasks when providers are degraded.
Ensure auxiliary failures do not trigger broad fallback storms or user-visible main-loop failures.

Acceptance criteria

Configurable concurrency limit for title generation and compression auxiliary calls.
Tests cover concurrent auxiliary calls and verify the limit is enforced.
Retry/fallback behavior respects Retry-After and avoids immediate retry storms.
Documentation explains how to isolate auxiliary providers or disable auxiliary calls during provider incidents.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#cache error #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix Limit auxiliary LLM concurrency and retry amplification [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #23346: refactor(cron): DRY media extensions from gateway.platforms.base

Description (problem / solution / changelog)

Changed files

PR #23510: fix(agent): cap auxiliary LLM concurrency per task

Description (problem / solution / changelog)

What changed and why

How to test

What platforms tested on

Changed files

Summary

Evidence

Proposed changes

Acceptance criteria

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix Limit auxiliary LLM concurrency and retry amplification [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #23346: refactor(cron): DRY media extensions from gateway.platforms.base

Description (problem / solution / changelog)

Changed files

PR #23510: fix(agent): cap auxiliary LLM concurrency per task

Description (problem / solution / changelog)

What changed and why

How to test

What platforms tested on

Changed files

Summary

Evidence

Proposed changes

Acceptance criteria

Still need to ship something?

RELATED_DISCOVERY

TRENDING