hermes - ✅(Solved) Fix kanban dispatcher retries kanban_block with 'review-required:' reasons up to failure_limit, causing duplicate worker runs of finished tasks [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#29027Fetched 2026-05-20 04:00:31
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Participants
Timeline (top)
cross-referenced ×3labeled ×3renamed ×1

A kanban worker that finishes its work and calls kanban_block with a reason indicating "done, awaiting review" (e.g. "review-required: 20/20 tests pass, build passes") gets re-spawned by the dispatcher up to kanban.failure_limit times. The work is idempotent (git state already changed), so the same implementation runs over and over, each ending in kanban_blockpromoted → respawn, until the failure budget is exhausted and the task finally settles into blocked.

This wastes coordinator tokens, pollutes the run log with N copies of "review-required" runs, and confuses orchestrator agents that read the run history (they see N "blocked" runs and conclude the card is genuinely stuck, when in fact every run succeeded).

The user-facing fix is to call kanban_complete --summary "..." instead — that's a true terminal state, the dispatcher leaves it alone, and the linked review task auto-promotes. But the bundled kanban-worker skill (the one auto-loaded into every dispatcher-spawned worker via --skills kanban-worker) doesn't currently make this distinction sharp enough, so well-intentioned workers fall into the trap.

Root Cause

The dispatcher's promote logic in hermes_cli/kanban_db.py (the gateway-embedded dispatch loop) treats a worker-initiated kanban_block the same as a worker crash or timeout: it counts toward failure_limit and re-spawns the task. There's no distinction between:

  • "I cannot make progress on this work — human intervention needed" (legitimate block; retry pointless)
  • "I finished this work — handing off to the next stage" (should be complete, not block)
  • "I crashed mid-run" (legitimate retry candidate)

All three end up in the same retry path, gated only by failure_limit. The first case suffers wasted retries; the second case suffers wasted retries AND duplicate implementations; only the third actually benefits from retry.

Fix Action

Fix / Workaround

A kanban worker that finishes its work and calls kanban_block with a reason indicating "done, awaiting review" (e.g. "review-required: 20/20 tests pass, build passes") gets re-spawned by the dispatcher up to kanban.failure_limit times. The work is idempotent (git state already changed), so the same implementation runs over and over, each ending in kanban_blockpromoted → respawn, until the failure budget is exhausted and the task finally settles into blocked.

The user-facing fix is to call kanban_complete --summary "..." instead — that's a true terminal state, the dispatcher leaves it alone, and the linked review task auto-promotes. But the bundled kanban-worker skill (the one auto-loaded into every dispatcher-spawned worker via --skills kanban-worker) doesn't currently make this distinction sharp enough, so well-intentioned workers fall into the trap.

  • Hermes Agent v0.14.0 (2026.5.16)
  • macOS Darwin 25.2.0, Python 3.11.15
  • kanban.dispatch_in_gateway: true, kanban.failure_limit: 2 (defaults)
  • Profile: orchestrator (gpt-5.5 / openai-codex) dispatches builder kanban workers (also gpt-5.5 / openai-codex) for code-changing tasks linked to a downstream reviewer card.

PR fix notes

PR #29064: fix: prevent kanban dispatcher from retrying handoff blocks (#29027)

Description (problem / solution / changelog)

Problem

When workers call kanban_block with handoff reasons like review-required:, the dispatcher treats it as a failure and retries the task up to failure_limit times. This causes:

  • Wasted coordinator tokens
  • Polluted run logs
  • Confused orchestrator agents

Root Cause

The dispatcher's failure counting logic doesn't distinguish between:

  • True failures: crashes, timeouts, genuine errors
  • Deliberate handoffs: completed work awaiting review (e.g., review-required:)

Both trigger consecutive_failures incrementing and retry behavior.

Solution

Add a handoff parameter to kanban_block to mark deliberate handoffs. The dispatcher now:

  1. Stores handoff flag in the database (0 = failure, 1 = handoff)
  2. Auto-detects handoff patterns from reason text:
    • review-required:
    • handoff:
    • needs-review:
    • awaiting review:
  3. Skips failure counting and retry logic for handoff blocks

Changes

  • Database: Added handoff INTEGER NOT NULL DEFAULT 0 column to tasks table with migration
  • Core: Updated block_task() to accept handoff parameter
  • Dispatcher: Modified _record_task_failure() to skip handoff blocks
  • Tools: Updated _handle_block() to parse and auto-detect handoff parameter
  • Schema: Added handoff parameter to kanban_block tool registration
  • Documentation: Updated kanban-worker skill with handoff usage guidance
  • Tests: Added comprehensive tests for handoff functionality

Testing

All tests pass:

  • test_handoff_block_prevents_retry(): Verifies handoff blocks don't increment failure counter
  • test_regular_block_counts_as_failure(): Ensures regular blocks still count as failures
  • test_auto_detect_handoff_from_reason(): Tests pattern-based auto-detection

Impact

  • Backward compatible: Existing code continues to work (handoff defaults to 0)
  • User-friendly: Auto-detection means workers don't need to explicitly set handoff=True
  • Efficient: Prevents wasteful retry loops for legitimate handoffs

Fixes #29027

Changed files

  • hermes_cli/kanban_db.py (modified, +33/-44)
  • tests/hermes_cli/test_kanban_handoff.py (added, +162/-0)
  • tools/kanban_tools.py (modified, +12/-1)

Code Example

[21:19] [run 98] blocked {'reason': 'review-required: 19/19 tests pass...'}
   [21:20] promoted
   [21:20] [run 99] claimed
   [21:25] [run 99] blocked {'reason': 'review-required: 19/19 tests pass...'}
   [21:25] promoted
   [21:26] [run 100] claimed
   [21:29] [run 100] blocked {'reason': 'review-required: 20/20 tests pass...'}

---

kanban_complete(
    task_id=...,
    summary="...",
    metadata={"tests_passed": N, "tests_total": N, "build": "pass", "changed_files": [...]}
)
RAW_BUFFERClick to expand / collapse

Summary

A kanban worker that finishes its work and calls kanban_block with a reason indicating "done, awaiting review" (e.g. "review-required: 20/20 tests pass, build passes") gets re-spawned by the dispatcher up to kanban.failure_limit times. The work is idempotent (git state already changed), so the same implementation runs over and over, each ending in kanban_blockpromoted → respawn, until the failure budget is exhausted and the task finally settles into blocked.

This wastes coordinator tokens, pollutes the run log with N copies of "review-required" runs, and confuses orchestrator agents that read the run history (they see N "blocked" runs and conclude the card is genuinely stuck, when in fact every run succeeded).

The user-facing fix is to call kanban_complete --summary "..." instead — that's a true terminal state, the dispatcher leaves it alone, and the linked review task auto-promotes. But the bundled kanban-worker skill (the one auto-loaded into every dispatcher-spawned worker via --skills kanban-worker) doesn't currently make this distinction sharp enough, so well-intentioned workers fall into the trap.

Environment

  • Hermes Agent v0.14.0 (2026.5.16)
  • macOS Darwin 25.2.0, Python 3.11.15
  • kanban.dispatch_in_gateway: true, kanban.failure_limit: 2 (defaults)
  • Profile: orchestrator (gpt-5.5 / openai-codex) dispatches builder kanban workers (also gpt-5.5 / openai-codex) for code-changing tasks linked to a downstream reviewer card.

Repro

  1. Set up a build/review task pair: t_build (assignee: builder) with t_review (assignee: reviewer) linked as child.
  2. Either configure the builder skill to call kanban_block "review-required: <details>" when work is done, or let an agent reach that decision organically — the bundled kanban-worker skill phrasing doesn't strongly discourage it.
  3. Watch the event stream after the first builder run completes:
    [21:19] [run 98] blocked {'reason': 'review-required: 19/19 tests pass...'}
    [21:20] promoted
    [21:20] [run 99] claimed
    [21:25] [run 99] blocked {'reason': 'review-required: 19/19 tests pass...'}
    [21:25] promoted
    [21:26] [run 100] claimed
    [21:29] [run 100] blocked {'reason': 'review-required: 20/20 tests pass...'}
  4. Three identical implementations of t_build, three "review-required" blocks, until failure_limit=2 is hit. t_review never auto-promotes because t_build never reaches done.

Root cause

The dispatcher's promote logic in hermes_cli/kanban_db.py (the gateway-embedded dispatch loop) treats a worker-initiated kanban_block the same as a worker crash or timeout: it counts toward failure_limit and re-spawns the task. There's no distinction between:

  • "I cannot make progress on this work — human intervention needed" (legitimate block; retry pointless)
  • "I finished this work — handing off to the next stage" (should be complete, not block)
  • "I crashed mid-run" (legitimate retry candidate)

All three end up in the same retry path, gated only by failure_limit. The first case suffers wasted retries; the second case suffers wasted retries AND duplicate implementations; only the third actually benefits from retry.

Suggested fix

Two options, either alone is sufficient:

  1. Dispatcher-side semantic. Treat any kanban_block reason matching a configurable handoff prefix (e.g. ^review-required:|^handoff:|^needs-review:) as a terminal handoff rather than a retry candidate. Or — cleaner — add a handoff: bool parameter to the kanban_block tool that the dispatcher honors. The bundled kanban-worker skill can then use kanban_block(..., handoff=True) for the "done, needs review" case without conflating it with true blockers.

  2. Skill-side guidance. Update the bundled kanban-worker skill to explicitly state: "If implementation succeeded and verification passed, call kanban_complete --summary "...". Reserve kanban_block for true blockers (auth fail, missing creds, ambiguous scope)." Include a worked example of each. Cheaper and ships without code change, but doesn't help users who write custom worker skills.

Option 1 is the more defensive choice — it makes the dispatcher correct by construction. Option 2 is a 30-line patch to the bundled prompt and avoids the loop for any worker that follows the upstream guidance.

Workaround for users hitting this today

In your custom worker skill, drop the kanban_block "review-required: ..." pattern entirely. After verification passes, call:

kanban_complete(
    task_id=...,
    summary="...",
    metadata={"tests_passed": N, "tests_total": N, "build": "pass", "changed_files": [...]}
)

The linked reviewer task (the natural human/agent review checkpoint) will auto-promote to ready. Reserve kanban_block for situations where the same work cannot succeed on retry.

Why this matters

The "Hermes coordinates, downstream lane reviews" pattern is exactly the multi-stage flow the kanban subsystem is designed for. The current dispatcher behavior makes the most natural way to express that pattern (a worker emitting kanban_block "review-required:") silently expensive. Users only notice when they audit run logs or see Telegram-Hermes report N "blocked" runs that confuse the orchestrator about whether anything is actually broken.

Related: #29015 (separate issue, same setup session — claude CLI auth from worker subprocesses).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix kanban dispatcher retries kanban_block with 'review-required:' reasons up to failure_limit, causing duplicate worker runs of finished tasks [1 pull requests, 1 participants]