hermes - ✅(Solved) Fix kanban dispatcher retries kanban_block with 'review-required:' reasons up to failure_limit, causing duplicate worker runs of finished tasks [1 pull requests, 1 participants]

lukelandis3-sketch · 2026-05-20T01:39:54Z

[hermes] A kanban worker that finishes its work and calls kanban block with a reason indicating "done, awaiting review" e.g. "review-required: 20/20 tests pass… A kanban worker that finishes its work and calls `kanban_block` with a reason indicating "done, awaiting review" (e.g. `"review-required: 20/20 tests pass, build passes"`) gets re-spawned by the dispatcher up to `kanban.failure_limit` times. The work is idempotent (git state already changed), so the same implementation runs over and over, each ending in `kanban_block` → `promoted` → respawn, until the failure budget is exhausted and the task finally settles into `blocked`. This wastes coordinator tokens, pollutes the run log with N copies of "review-required" runs, and confuses orchestrator agents that read the run history (they see N "blocked" runs and conclude the card is genuinely stuck, when in fact every run succeeded). The user-facing fix is to call `kanban_complete --summary "..."` instead — that's a true terminal state, the dispatcher leaves it alone, and the linked review task auto-promotes. But the bundled `kanban-worker` skill (the one auto-loaded into every dispatcher-spawned worker via `--skills kanban-worker`) doesn't currently make this distinction sharp enough, so well-intentioned workers fall into the trap. # PR #29064: fix: prevent kanban dispatcher from retrying handoff blocks (#29027) - Repository: NousResearch/hermes-agent - Author: AllynSheep - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/29064 ## Description (problem / solution / changelog) ## Problem When workers call `kanban_block` with handoff reasons like `review-required:`, the dispatcher treats it as a failure and retries the task up to `failure_limit` times. This causes: - Wasted coordinator tokens - Polluted run logs - Confused orchestrator agents ## Root Cause The dispatcher's failure counting logic doesn't distinguish between: - **True failures**: crashes, timeouts, genuine errors - **Deliberate handoffs**: completed work awaiting review (e.g., `review-required:`) Both trigger `consecutive_failures` incrementing and retry behavior. ## Solution Add a `handoff` parameter to `kanban_block` to mark deliberate handoffs. The dispatcher now: 1. Stores `handoff` flag in the database (0 = failure, 1 = handoff) 2. Auto-detects handoff patterns from reason text: - `review-required:` - `handoff:` - `needs-review:` - `awaiting review:` 3. Skips failure counting and retry logic for handoff blocks ## Changes - **Database**: Added `handoff INTEGER NOT NULL DEFAULT 0` column to tasks table with migration - **Core**: Updated `block_task()` to accept `handoff` parameter - **Dispatcher**: Modified `_record_task_failure()` to skip handoff blocks - **Tools**: Updated `_handle_block()` to parse and auto-detect handoff parameter - **Schema**: Added `handoff` parameter to `kanban_block` tool registration - **Documentation**: Updated kanban-worker skill with handoff usage guidance - **Tests**: Added comprehensive tests for handoff functionality ## Testing All tests pass: - `test_handoff_block_prevents_retry()`: Verifies handoff blocks don't increment failure counter - `test_regular_block_counts_as_failure()`: Ensures regular blocks still count as failures - `test_auto_detect_handoff_from_reason()`: Tests pattern-based auto-detection ## Impact - **Backward compatible**: Existing code continues to work (handoff defaults to 0) - **User-friendly**: Auto-detection means workers don't need to explicitly set `handoff=True` - **Efficient**: Prevents wasteful retry loops for legitimate handoffs Fixes #29027 ## Changed files - `hermes_cli/kanban_db.py` (modified, +33/-44) - `tests/hermes_cli/test_kanban_handoff.py` (added, +162/-0) - `tools/kanban_tools.py` (modified, +12/-1) ## Fix / Workaround A kanban worker that finishes its work and calls `kanban_block` with a reason indicating "done, awaiting review" (e.g. `"review-required: 20/20 tests pass, build passes"`) gets re-spawned by the dispatcher up to `kanban.failure_limit` times. The work is idempotent (git state already changed), so the same implementation runs over and over, each ending in `kanban_block` → `promoted` → respawn, until the failure budget is exhausted and the task finally settles into `blocked`. The user-facing fix is to call `kanban_complete --summary "..."` instead — that's a true terminal state, the dispatcher leaves it alone, and the linked review task auto-promotes. But the bundled `kanban-worker` skill (the one auto-loaded into every dispatcher-spawned worker via `--skills kanban-worker`) doesn't currently make this distinction sharp enough, so well-intentioned workers fall into the trap. - Hermes Agent `v0.14.0` (2026.5.16) - macOS Darwin 25.2.0, Python 3.11.15 - `kanban.dispatch_in_gateway: true`, `kanban.failure_limit: 2` (defaults) - Profile: orchestrator (gpt-5.5 / openai-codex) dispatches builder kanban workers (also gpt-5.5 / openai-codex) for code-changing tasks linked to a downs

hermes2026-05-20 01:39:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#29027•Fetched 2026-05-20 04:00:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

lukelandis3-sketch

Participants

lukelandis3-sketch

Timeline (top)

cross-referenced ×3labeled ×3renamed ×1

A kanban worker that finishes its work and calls kanban_block with a reason indicating "done, awaiting review" (e.g. "review-required: 20/20 tests pass, build passes") gets re-spawned by the dispatcher up to kanban.failure_limit times. The work is idempotent (git state already changed), so the same implementation runs over and over, each ending in kanban_block → promoted → respawn, until the failure budget is exhausted and the task finally settles into blocked.

This wastes coordinator tokens, pollutes the run log with N copies of "review-required" runs, and confuses orchestrator agents that read the run history (they see N "blocked" runs and conclude the card is genuinely stuck, when in fact every run succeeded).

The user-facing fix is to call kanban_complete --summary "..." instead — that's a true terminal state, the dispatcher leaves it alone, and the linked review task auto-promotes. But the bundled kanban-worker skill (the one auto-loaded into every dispatcher-spawned worker via --skills kanban-worker) doesn't currently make this distinction sharp enough, so well-intentioned workers fall into the trap.

Root Cause

The dispatcher's promote logic in hermes_cli/kanban_db.py (the gateway-embedded dispatch loop) treats a worker-initiated kanban_block the same as a worker crash or timeout: it counts toward failure_limit and re-spawns the task. There's no distinction between:

"I cannot make progress on this work — human intervention needed" (legitimate block; retry pointless)
"I finished this work — handing off to the next stage" (should be complete, not block)
"I crashed mid-run" (legitimate retry candidate)

All three end up in the same retry path, gated only by failure_limit. The first case suffers wasted retries; the second case suffers wasted retries AND duplicate implementations; only the third actually benefits from retry.

Fix Action

Fix / Workaround

Hermes Agent v0.14.0 (2026.5.16)
macOS Darwin 25.2.0, Python 3.11.15
kanban.dispatch_in_gateway: true, kanban.failure_limit: 2 (defaults)
Profile: orchestrator (gpt-5.5 / openai-codex) dispatches builder kanban workers (also gpt-5.5 / openai-codex) for code-changing tasks linked to a downstream reviewer card.

PR fix notes

PR #29064: fix: prevent kanban dispatcher from retrying handoff blocks (#29027)

Repository: NousResearch/hermes-agent
Author: AllynSheep
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/29064

Description (problem / solution / changelog)

Problem

When workers call kanban_block with handoff reasons like review-required:, the dispatcher treats it as a failure and retries the task up to failure_limit times. This causes:

Wasted coordinator tokens
Polluted run logs
Confused orchestrator agents

Root Cause

The dispatcher's failure counting logic doesn't distinguish between:

True failures: crashes, timeouts, genuine errors
Deliberate handoffs: completed work awaiting review (e.g., review-required:)

Both trigger consecutive_failures incrementing and retry behavior.

Solution

Add a handoff parameter to kanban_block to mark deliberate handoffs. The dispatcher now:

Stores handoff flag in the database (0 = failure, 1 = handoff)
Auto-detects handoff patterns from reason text:
- review-required:
- handoff:
- needs-review:
- awaiting review:
Skips failure counting and retry logic for handoff blocks

Changes

Database: Added handoff INTEGER NOT NULL DEFAULT 0 column to tasks table with migration
Core: Updated block_task() to accept handoff parameter
Dispatcher: Modified _record_task_failure() to skip handoff blocks
Tools: Updated _handle_block() to parse and auto-detect handoff parameter
Schema: Added handoff parameter to kanban_block tool registration
Documentation: Updated kanban-worker skill with handoff usage guidance
Tests: Added comprehensive tests for handoff functionality

Testing

All tests pass:

test_handoff_block_prevents_retry(): Verifies handoff blocks don't increment failure counter
test_regular_block_counts_as_failure(): Ensures regular blocks still count as failures
test_auto_detect_handoff_from_reason(): Tests pattern-based auto-detection

Impact

Backward compatible: Existing code continues to work (handoff defaults to 0)
User-friendly: Auto-detection means workers don't need to explicitly set handoff=True
Efficient: Prevents wasteful retry loops for legitimate handoffs

Fixes #29027

Changed files

hermes_cli/kanban_db.py (modified, +33/-44)
tests/hermes_cli/test_kanban_handoff.py (added, +162/-0)
tools/kanban_tools.py (modified, +12/-1)

Code Example

[21:19] [run 98] blocked {'reason': 'review-required: 19/19 tests pass...'}
   [21:20] promoted
   [21:20] [run 99] claimed
   [21:25] [run 99] blocked {'reason': 'review-required: 19/19 tests pass...'}
   [21:25] promoted
   [21:26] [run 100] claimed
   [21:29] [run 100] blocked {'reason': 'review-required: 20/20 tests pass...'}

---

kanban_complete(
    task_id=...,
    summary="...",
    metadata={"tests_passed": N, "tests_total": N, "build": "pass", "changed_files": [...]}
)

RAW_BUFFERClick to expand / collapse

Summary

Environment

Hermes Agent v0.14.0 (2026.5.16)
macOS Darwin 25.2.0, Python 3.11.15
kanban.dispatch_in_gateway: true, kanban.failure_limit: 2 (defaults)
Profile: orchestrator (gpt-5.5 / openai-codex) dispatches builder kanban workers (also gpt-5.5 / openai-codex) for code-changing tasks linked to a downstream reviewer card.

Repro

Set up a build/review task pair: t_build (assignee: builder) with t_review (assignee: reviewer) linked as child.
Either configure the builder skill to call kanban_block "review-required: <details>" when work is done, or let an agent reach that decision organically — the bundled kanban-worker skill phrasing doesn't strongly discourage it.

Watch the event stream after the first builder run completes:

[21:19] [run 98] blocked {'reason': 'review-required: 19/19 tests pass...'}
[21:20] promoted
[21:20] [run 99] claimed
[21:25] [run 99] blocked {'reason': 'review-required: 19/19 tests pass...'}
[21:25] promoted
[21:26] [run 100] claimed
[21:29] [run 100] blocked {'reason': 'review-required: 20/20 tests pass...'}

Three identical implementations of t_build, three "review-required" blocks, until failure_limit=2 is hit. t_review never auto-promotes because t_build never reaches done.

Root cause

"I cannot make progress on this work — human intervention needed" (legitimate block; retry pointless)
"I finished this work — handing off to the next stage" (should be complete, not block)
"I crashed mid-run" (legitimate retry candidate)

Suggested fix

Two options, either alone is sufficient:

Dispatcher-side semantic. Treat any kanban_block reason matching a configurable handoff prefix (e.g. ^review-required:|^handoff:|^needs-review:) as a terminal handoff rather than a retry candidate. Or — cleaner — add a handoff: bool parameter to the kanban_block tool that the dispatcher honors. The bundled kanban-worker skill can then use kanban_block(..., handoff=True) for the "done, needs review" case without conflating it with true blockers.
Skill-side guidance. Update the bundled kanban-worker skill to explicitly state: "If implementation succeeded and verification passed, call kanban_complete --summary "...". Reserve kanban_block for true blockers (auth fail, missing creds, ambiguous scope)." Include a worked example of each. Cheaper and ships without code change, but doesn't help users who write custom worker skills.

Option 1 is the more defensive choice — it makes the dispatcher correct by construction. Option 2 is a 30-line patch to the bundled prompt and avoids the loop for any worker that follows the upstream guidance.

Workaround for users hitting this today

In your custom worker skill, drop the kanban_block "review-required: ..." pattern entirely. After verification passes, call:

kanban_complete(
    task_id=...,
    summary="...",
    metadata={"tests_passed": N, "tests_total": N, "build": "pass", "changed_files": [...]}
)

The linked reviewer task (the natural human/agent review checkpoint) will auto-promote to ready. Reserve kanban_block for situations where the same work cannot succeed on retry.

Why this matters

The "Hermes coordinates, downstream lane reviews" pattern is exactly the multi-stage flow the kanban subsystem is designed for. The current dispatcher behavior makes the most natural way to express that pattern (a worker emitting kanban_block "review-required:") silently expensive. Users only notice when they audit run logs or see Telegram-Hermes report N "blocked" runs that confuse the orchestrator about whether anything is actually broken.

Related: #29015 (separate issue, same setup session — claude CLI auth from worker subprocesses).

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix kanban dispatcher retries kanban_block with 'review-required:' reasons up to failure_limit, causing duplicate worker runs of finished tasks [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #29064: fix: prevent kanban dispatcher from retrying handoff blocks (#29027)

Description (problem / solution / changelog)

Problem

Root Cause

Solution

Changes

Testing

Impact

Changed files

Code Example

Summary

Environment

Repro

Root cause

Suggested fix

Workaround for users hitting this today

Why this matters

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix kanban dispatcher retries kanban_block with 'review-required:' reasons up to failure_limit, causing duplicate worker runs of finished tasks [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #29064: fix: prevent kanban dispatcher from retrying handoff blocks (#29027)

Description (problem / solution / changelog)

Problem

Root Cause

Solution

Changes

Testing

Impact

Changed files

Code Example

Summary

Environment

Repro

Root cause

Suggested fix

Workaround for users hitting this today

Why this matters

Still need to ship something?

RELATED_DISCOVERY

TRENDING