hermes - 💡(How to fix) Fix Kanban worker iteration-budget exhaustion creates permanent sticky block (no auto-recovery)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a kanban worker hits the iteration budget limit (default 90/90 tool calls), the agent loop calls kanban_block() which emits a "blocked" event. The dispatch loop's _has_sticky_block() check sees this event and refuses to auto-recover the task — it stays blocked forever until a human runs hermes kanban unblock.

Root Cause

Two interacting bugs in agent/conversation_loop.py and hermes_cli/kanban_db.py:

Fix Action

Fix / Workaround

When a kanban worker hits the iteration budget limit (default 90/90 tool calls), the agent loop calls kanban_block() which emits a "blocked" event. The dispatch loop's _has_sticky_block() check sees this event and refuses to auto-recover the task — it stays blocked forever until a human runs hermes kanban unblock.

Code Example

# Current (broken):
_ra().handle_function_call("kanban_block", {
    "task_id": _kanban_task,
    "reason": f"Iteration budget exhausted ({api_call_count}/{agent.max_iterations})...",
})

# Suggested: use _record_task_failure or direct status update
# that does NOT emit a "blocked" event, so _has_sticky_block returns False
# and recompute_ready can auto-recover + failure counting works.
RAW_BUFFERClick to expand / collapse

Summary

When a kanban worker hits the iteration budget limit (default 90/90 tool calls), the agent loop calls kanban_block() which emits a "blocked" event. The dispatch loop's _has_sticky_block() check sees this event and refuses to auto-recover the task — it stays blocked forever until a human runs hermes kanban unblock.

Root Cause

Two interacting bugs in agent/conversation_loop.py and hermes_cli/kanban_db.py:

Bug 1: Budget exhaustion creates a "sticky" block (wrong event type)

conversation_loop.py lines 4306–4338: when a kanban worker exhausts its iteration budget, the code calls handle_function_call("kanban_block", ...) on behalf of the agent. The block_task() function in kanban_db.py sets status=blocked and emits a "blocked" event (line 3611).

_has_sticky_block() (kanban_db.py:2412–2447) reads the most recent "blocked" / "unblocked" event for the task. Since block_task() just emitted "blocked", the function returns True. recompute_ready() (line 2473) skips sticky-blocked tasks entirely — they never get promoted back to ready.

The design intent (documented at lines 2416–2428) distinguishes two types of blocked:

  • Worker/operator-initiated (kanban_block tool) → emits "blocked" event → sticky → human must unblock
  • Circuit-breaker (repeated crashes via _record_task_failure) → emits "gave_up" event → non-sticky → auto-recoverable

Budget exhaustion is NOT a deliberate handoff — it should follow the circuit-breaker path, not the kanban_block path.

Bug 2: Auto-recovery resets consecutive_failures → infinite retry loop

recompute_ready() (line 2491–2492) resets consecutive_failures = 0 when recovering a blocked task. Even if Bug 1 were fixed, a task that repeatedly exhausts its budget would loop forever: block → auto-recover (failures=0) → respawn → 90 iterations → block → ...

The kanban.failure_limit mechanism (default 2) can never trigger because the counter is reset on every recovery cycle.

Affected Version

v0.15.1 (and likely earlier versions since the iteration-budget feature was introduced)

Code References

FileLine(s)Issue
agent/conversation_loop.py4306–4338Budget exhaustion calls kanban_block() instead of circuit-breaker path
hermes_cli/kanban_db.py2412–2447_has_sticky_block() treats budget-exhaustion blocks as sticky
hermes_cli/kanban_db.py2450–2503recompute_ready() skips sticky blocks
hermes_cli/kanban_db.py2489–2492Auto-recovery resets consecutive_failures = 0
hermes_cli/kanban_db.py3560–3612block_task() emits "blocked" event

Expected Behavior

  1. Non-sticky block: When a kanban worker exhausts its iteration budget, the task should be blocked WITHOUT emitting a "blocked" event (so _has_sticky_block returns False and recompute_ready can auto-recover it).

  2. Circuit-breaker: After failure_limit consecutive budget-exhaustion failures, the task should be truly blocked via _record_task_failure with a "gave_up" event, preventing infinite retry cycles.

  3. Bonus: per-task max_iterations: Workers spawned for scraping/processing tasks would benefit from a configurable max_iterations override (e.g. via task metadata or config.yaml). A site with 103 articles needs more than 90 iterations; a site with 0 articles doesn't need 90.

Real-World Impact

In a music-review scraping pipeline with 48 sites and 20 Camoufox workers per day, two tasks (one with 0 results, one with 55 articles) hit 90/90 budget exhaustion and sat blocked for ~4 hours until manually force-completed. The 18 other workers that finished in under 90 iterations were fine.

Suggested Fix

In conversation_loop.py, replace the kanban_block call with direct DB status update (status=blocked without emitting a "blocked" event), OR emit a "gave_up" event instead. Then _record_task_failure should be used for the failure counting so the circuit breaker can eventually stop retrying.

# Current (broken):
_ra().handle_function_call("kanban_block", {
    "task_id": _kanban_task,
    "reason": f"Iteration budget exhausted ({api_call_count}/{agent.max_iterations})...",
})

# Suggested: use _record_task_failure or direct status update
# that does NOT emit a "blocked" event, so _has_sticky_block returns False
# and recompute_ready can auto-recover + failure counting works.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Kanban worker iteration-budget exhaustion creates permanent sticky block (no auto-recovery)