hermes - ✅(Solved) Fix Limit auxiliary LLM concurrency and retry amplification [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#23324Fetched 2026-05-11 03:29:59
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×3labeled ×3

Auxiliary LLM tasks such as title generation and context compression can amplify transient provider failures. Each active session can spawn background daemon work, and those calls may retry/fallback independently. During OpenAI/Codex instability this can create bursts of requests that trip downstream proxy breakers and affect main user-facing requests.

Root Cause

Auxiliary LLM tasks such as title generation and context compression can amplify transient provider failures. Each active session can spawn background daemon work, and those calls may retry/fallback independently. During OpenAI/Codex instability this can create bursts of requests that trip downstream proxy breakers and affect main user-facing requests.

Fix Action

Fixed

PR fix notes

PR #23346: refactor(cron): DRY media extensions from gateway.platforms.base

Description (problem / solution / changelog)

Moves _VIDEO_EXTS and _IMAGE_EXTS to the canonical location in gateway/platforms/base.py and imports them from cron/scheduler.py instead of duplicating. This prevents drift when new formats are added (e.g. .heic images) and removes the "keep in sync" comment hazard.

Related to #23324.

Changed files

  • cron/scheduler.py (modified, +3/-4)
  • gateway/platforms/base.py (modified, +3/-3)
  • scripts/release.py (modified, +1/-0)

PR #23510: fix(agent): cap auxiliary LLM concurrency per task

Description (problem / solution / changelog)

Cap concurrent in-flight auxiliary LLM calls per task to prevent retry/fallback storms from amplifying provider incidents.

What changed and why

  • agent.auxiliary_client.call_llm and async_call_llm now read auxiliary.<task>.max_concurrency from config and acquire a per-task semaphore around the entire call (including retries and provider fallback). Background work like title generation and context compression can spawn one call per active session — during provider degradation those calls fan out across retries and trip downstream rate limits, so the cap stops the amplification.
  • Sync limits use threading.BoundedSemaphore; async limits use asyncio.Semaphore keyed by (task, event-loop) so each loop gets its own. Semaphores are rebuilt when the configured limit changes, so live config edits take effect.
  • Default behavior is unchanged (no limit) so existing setups don't regress. The session-search task already has its own concurrency cap inside tools/session_search_tool.py; leaving that untouched.
  • cli-config.yaml.example and website/docs/user-guide/configuration.md document the knob with recommended values for title_generation and compression.

How to test

  • pytest tests/agent/test_auxiliary_concurrency.py -q — the new tests cover: config parsing (invalid / zero / negative / missing values), semaphore caching + rebuild on limit change, sync concurrency enforcement under 6 concurrent threads with limit=2, async enforcement under 6 concurrent tasks with limit=2, and semaphore release on exception so the cap doesn't deadlock after failures.
  • pytest tests/agent/test_auxiliary_client.py -q — full existing auxiliary suite still passes (two unrelated Codex-timeout tests are pre-existing flakes; verified they fail on main without these changes too).
  • Manual: set auxiliary.title_generation.max_concurrency: 2 in ~/.hermes/cli-config.yaml, open multiple sessions that all trigger first-exchange title generation, and watch the provider logs — only two title calls run at a time.

What platforms tested on

  • macOS on darwin-arm64 (local)

Fixes #23324

<!-- autocontrib:worker-id=issue-new-daf1ce33 kind=pr-open -->

Changed files

  • agent/auxiliary_client.py (modified, +160/-0)
  • cli-config.yaml.example (modified, +19/-0)
  • tests/agent/test_auxiliary_concurrency.py (added, +288/-0)
  • website/docs/user-guide/configuration.md (modified, +30/-0)
RAW_BUFFERClick to expand / collapse

Summary

Auxiliary LLM tasks such as title generation and context compression can amplify transient provider failures. Each active session can spawn background daemon work, and those calls may retry/fallback independently. During OpenAI/Codex instability this can create bursts of requests that trip downstream proxy breakers and affect main user-facing requests.

Evidence

  • agent/title_generator.py spawns background title generation after the first exchange.
  • agent/auxiliary_client.py centralizes auxiliary LLM calls and can retry/fallback across providers for payment, rate-limit, auth, and connection failures.
  • No global semaphore/queue was found for auxiliary LLM calls, so concurrent Discord/session activity can multiply retries.

Proposed changes

  • Add global/per-task concurrency limits for auxiliary LLM calls, starting with title generation and context compression.
  • Add backoff/jitter and respect upstream Retry-After when present.
  • Allow auxiliary tasks to use a dedicated provider/model separate from main interactive agent traffic.
  • Add config to disable or fail-fast auxiliary tasks when providers are degraded.
  • Ensure auxiliary failures do not trigger broad fallback storms or user-visible main-loop failures.

Acceptance criteria

  • Configurable concurrency limit for title generation and compression auxiliary calls.
  • Tests cover concurrent auxiliary calls and verify the limit is enforced.
  • Retry/fallback behavior respects Retry-After and avoids immediate retry storms.
  • Documentation explains how to isolate auxiliary providers or disable auxiliary calls during provider incidents.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING