hermes - ✅(Solved) Fix kanban workers fabricate card IDs in complete() — no verification that claimed cards exist [2 pull requests, 1 participants]

BowmanStephen · 2026-05-05T03:08:55Z

[hermes] Kanban workers completing verification tasks routinely claim to have created remediation cards that do not exist in the database. Workers call kanban… Kanban workers completing verification tasks routinely claim to have created remediation cards that do not exist in the database. Workers call `kanban_complete()` naming card IDs (e.g., "created remediation cards t_X, t_Y, t_Z") that were never actually created. The kernel accepts `kanban_complete()` with no validation that the claimed IDs exist. # PR #20022: fix(kanban): validate created card handoffs - Repository: NousResearch/hermes-agent - Author: LeonSGP43 - State: closed | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/20022 ## Description (problem / solution / changelog) Fixes #20017 ## Summary - validate optional `metadata.created_cards` before completing a Kanban task - reject claimed card IDs that do not exist, or that are neither linked children nor created by the completing worker profile - document the structured `created_cards` field in the `kanban_complete` tool schema ## Scope This only validates explicit structured claims in `metadata.created_cards`. Existing summaries/results without that field continue to complete normally. ## Verification - `scripts/run_tests.sh tests/hermes_cli/test_kanban_core_functionality.py::test_complete_rejects_missing_created_cards_metadata tests/hermes_cli/test_kanban_core_functionality.py::test_complete_accepts_created_cards_linked_to_parent tests/hermes_cli/test_kanban_core_functionality.py::test_complete_rejects_unrelated_created_cards_metadata tests/hermes_cli/test_kanban_core_functionality.py::test_complete_accepts_created_cards_created_by_worker tests/hermes_cli/test_kanban_core_functionality.py::test_completed_event_payload_carries_summary` -> 5 passed - `scripts/run_tests.sh tests/tools/test_kanban_tools.py tests/hermes_cli/test_kanban_core_functionality.py::test_complete_rejects_missing_created_cards_metadata tests/hermes_cli/test_kanban_core_functionality.py::test_complete_accepts_created_cards_linked_to_parent tests/hermes_cli/test_kanban_core_functionality.py::test_complete_rejects_unrelated_created_cards_metadata tests/hermes_cli/test_kanban_core_functionality.py::test_complete_accepts_created_cards_created_by_worker` -> 40 passed - `git diff --check` ## Changed files - `hermes_cli/kanban_db.py` (modified, +63/-0) - `tests/hermes_cli/test_kanban_core_functionality.py` (modified, +93/-0) - `tools/kanban_tools.py` (modified, +4/-2) --- # PR #20232: feat(kanban): hallucination gate + recovery UX for worker-created-card claims (closes #20017) - Repository: NousResearch/hermes-agent - Author: teknium1 - State: closed | merged: True - Link: https://github.com/NousResearch/hermes-agent/pull/20232 ## Description (problem / solution / changelog) Workers claiming cards they created on `kanban_complete` now have those claims verified by the kernel, so phantom/hallucinated ids can't leak into downstream automation. Dashboard surfaces flagged tasks with a ⚠ badge + attention strip, and every task drawer gains a Recovery section with Reclaim / Reassign / Change-profile-model actions so operators can act on stuck workers without waiting for TTL. Closes #20017. ## What changed **Kernel gate** (`hermes_cli/kanban_db.py`): - New `created_cards: list[str]` param on `complete_task`. Each id is checked against `tasks.created_by` matching the completing task's `assignee` profile. - Phantom id → raise `HallucinatedCardsError` (subclass of `ValueError` so existing tool-error handlers treat it as recoverable), and emit a `completion_blocked_hallucination` event so the rejected attempt is auditable. Task stays in its prior state. - All-verified → record the manifest as `verified_cards` on the `completed` event payload. - Prose scan: after successful completion, regex-scan summary/result for `t_ ` patterns; any id that doesn't resolve emits a non-blocking `suspected_hallucinated_references` event. **Recovery helpers** (`hermes_cli/kanban_db.py`): - `reclaim_task(conn, task_id, reason=...)` — release active claim immediately (unlike `release_stale_claims` which only acts post-TTL). Emits `reclaimed` event with `manual: True` payload. - `reassign_task(conn, task_id, profile, reclaim_first=..., reason=...)` — switch a task's profile, optionally releasing a live claim in the same op. **Tool** (`tools/kanban_tools.py`): - `kanban_complete` accepts `created_cards`; tool-layer catches `HallucinatedCardsError` and returns a structured `tool_error` with phantom ids listed so the worker can retry with a correct list. **CLI** (`hermes_cli/kanban.py`): - `hermes kanban reclaim [--reason ...]` - `hermes kanban reassign [--reclaim] [--reason ...]` **API** (`plugins/kanban/dashboard/plugin_api.py`): - `POST /api/plugins/kanban/tasks/{id}/reclaim` (body: `{reason}`) - `POST /api/plugins/kanban/tasks/{id}/reassign` (body: `{profile, reclaim_first, reason}`) - `_compute_warnings

hermes2026-05-05 03:08:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#20017•Fetched 2026-05-06 06:39:12

View on GitHub

Comments

Participants

Timeline

Reactions

Author

BowmanStephen

Participants

BowmanStephen

Timeline (top)

labeled ×3cross-referenced ×2referenced ×2closed ×1

Kanban workers completing verification tasks routinely claim to have created remediation cards that do not exist in the database. Workers call kanban_complete() naming card IDs (e.g., "created remediation cards t_X, t_Y, t_Z") that were never actually created. The kernel accepts kanban_complete() with no validation that the claimed IDs exist.

Error Message

Silently failed (tool layer error, credential issue, etc.)

Add optional created_cards field on kanban_complete() metadata. If a worker claims "created_cards": ["t_A", "t_B"], the kernel verifies each ID exists in the DB and has the completing worker as parent/creator before accepting the completion. Reject with an error message naming the missing IDs.

Root Cause

kanban_complete() has no validation layer. A worker can say "created cards t_A, t_B, t_C" in their summary/metadata and the kernel accepts it unconditionally. There's no cross-check against task_links or the task table itself.

Fix Action

Fix / Workaround

In a single 2026-05-03 dispatcher session, 5 of 6 kanban verification workers reported creating remediation cards with 11 specific task IDs. Zero of those IDs existed in the database. The kanban_create() calls either:

Silently failed (tool layer error, credential issue, etc.)
Were hallucinated by the worker (LLM invented plausible-looking IDs)
Were described in prose but never actually executed

PR fix notes

PR #20022: fix(kanban): validate created card handoffs

Repository: NousResearch/hermes-agent
Author: LeonSGP43
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20022

Description (problem / solution / changelog)

Fixes #20017

Summary

validate optional metadata.created_cards before completing a Kanban task
reject claimed card IDs that do not exist, or that are neither linked children nor created by the completing worker profile
document the structured created_cards field in the kanban_complete tool schema

Scope

This only validates explicit structured claims in metadata.created_cards. Existing summaries/results without that field continue to complete normally.

Verification

scripts/run_tests.sh tests/hermes_cli/test_kanban_core_functionality.py::test_complete_rejects_missing_created_cards_metadata tests/hermes_cli/test_kanban_core_functionality.py::test_complete_accepts_created_cards_linked_to_parent tests/hermes_cli/test_kanban_core_functionality.py::test_complete_rejects_unrelated_created_cards_metadata tests/hermes_cli/test_kanban_core_functionality.py::test_complete_accepts_created_cards_created_by_worker tests/hermes_cli/test_kanban_core_functionality.py::test_completed_event_payload_carries_summary -> 5 passed
scripts/run_tests.sh tests/tools/test_kanban_tools.py tests/hermes_cli/test_kanban_core_functionality.py::test_complete_rejects_missing_created_cards_metadata tests/hermes_cli/test_kanban_core_functionality.py::test_complete_accepts_created_cards_linked_to_parent tests/hermes_cli/test_kanban_core_functionality.py::test_complete_rejects_unrelated_created_cards_metadata tests/hermes_cli/test_kanban_core_functionality.py::test_complete_accepts_created_cards_created_by_worker -> 40 passed
git diff --check

Changed files

hermes_cli/kanban_db.py (modified, +63/-0)
tests/hermes_cli/test_kanban_core_functionality.py (modified, +93/-0)
tools/kanban_tools.py (modified, +4/-2)

PR #20232: feat(kanban): hallucination gate + recovery UX for worker-created-card claims (closes #20017)

Repository: NousResearch/hermes-agent
Author: teknium1
State: closed | merged: True
Link: https://github.com/NousResearch/hermes-agent/pull/20232

Description (problem / solution / changelog)

Workers claiming cards they created on kanban_complete now have those claims verified by the kernel, so phantom/hallucinated ids can't leak into downstream automation. Dashboard surfaces flagged tasks with a ⚠ badge + attention strip, and every task drawer gains a Recovery section with Reclaim / Reassign / Change-profile-model actions so operators can act on stuck workers without waiting for TTL.

Closes #20017.

What changed

Kernel gate (hermes_cli/kanban_db.py):

New created_cards: list[str] param on complete_task. Each id is checked against tasks.created_by matching the completing task's assignee profile.
Phantom id → raise HallucinatedCardsError (subclass of ValueError so existing tool-error handlers treat it as recoverable), and emit a completion_blocked_hallucination event so the rejected attempt is auditable. Task stays in its prior state.
All-verified → record the manifest as verified_cards on the completed event payload.
Prose scan: after successful completion, regex-scan summary/result for t_<hex> patterns; any id that doesn't resolve emits a non-blocking suspected_hallucinated_references event.

Recovery helpers (hermes_cli/kanban_db.py):

reclaim_task(conn, task_id, reason=...) — release active claim immediately (unlike release_stale_claims which only acts post-TTL). Emits reclaimed event with manual: True payload.
reassign_task(conn, task_id, profile, reclaim_first=..., reason=...) — switch a task's profile, optionally releasing a live claim in the same op.

Tool (tools/kanban_tools.py):

kanban_complete accepts created_cards; tool-layer catches HallucinatedCardsError and returns a structured tool_error with phantom ids listed so the worker can retry with a correct list.

CLI (hermes_cli/kanban.py):

hermes kanban reclaim <task_id> [--reason ...]
hermes kanban reassign <task_id> <profile> [--reclaim] [--reason ...]

API (plugins/kanban/dashboard/plugin_api.py):

POST /api/plugins/kanban/tasks/{id}/reclaim (body: {reason})
POST /api/plugins/kanban/tasks/{id}/reassign (body: {profile, reclaim_first, reason})
_compute_warnings_for_tasks() helper reused by /board (all tasks) and /tasks/{id} (single). Returns {count, kinds, latest_at} per task.
Active-vs-stale rule: a completed or edited event AFTER the hallucination event clears the warning. Events persist for audit, but the badge doesn't permanently stigmatise.

Dashboard UI (plugins/kanban/dashboard/dist/index.js + style.css):

⚠ badge on cards with active warnings.
AttentionStrip — dismissible strip at top of board listing flagged tasks with Open buttons.
Events-tab callout — hallucination events render with red left border, amber icon, phantom ids as styled red chips.
RecoverySection in task drawer — keyed by task id so state doesn't leak between drawers. Auto-opens when the task has warnings. Three actions: Reclaim (button + reason input), Reassign (profile picker + reclaim-first checkbox), Change profile model (copy-to-clipboard CLI hint since profile config lives on disk).

Skills:

skills/devops/kanban-worker/SKILL.md — new "Claiming cards you actually created" section with good/bad examples.
skills/devops/kanban-orchestrator/SKILL.md — new "Recovering stuck workers" section.

Validation

	Result
Full kanban test suite	359/359 pass (`test_kanban_{db,cli,boards,core_functionality}` + dashboard + tools)
E2E: phantom id blocked, audit event lands, task stays in prior state	✓
E2E: prose scan fires on phantom prose refs, doesn't block	✓
E2E: cross-worker card rejection (card exists but different `created_by`)	✓
Live dashboard: attention strip, card badges, events callout render	✓ (screenshots in PR comments)
Live dashboard: Reassign action flips assignee + refreshes drawer	✓
Live dashboard: Reclaim on running task flips status to ready + emits manual reclaimed event	✓
Recovery section doesn't leak state between task drawers	✓ (keyed by task id)
Warnings cleared after clean re-completion, events retained for audit	✓

Honest caveat on scope

This ships the behavioral contract + visible recovery path. It does NOT include:

Fleet-health roll-up per-profile (can come later if manual intervention is slow in practice).
Circuit breaker that auto-disables a profile after N hallucination-blocks (deliberately deferred — the blast radius risk of auto-disabling outweighs the benefit until we have data).

Both are straightforward follow-ons once we see how often this fires in the wild.

Changed files

hermes_cli/kanban.py (modified, +70/-0)
hermes_cli/kanban_db.py (modified, +280/-6)
plugins/kanban/dashboard/dist/index.js (modified, +376/-5)
plugins/kanban/dashboard/dist/style.css (modified, +253/-0)
plugins/kanban/dashboard/plugin_api.py (modified, +162/-1)
skills/devops/kanban-orchestrator/SKILL.md (modified, +10/-0)
skills/devops/kanban-worker/SKILL.md (modified, +26/-0)
tests/hermes_cli/test_kanban_cli.py (modified, +78/-0)
tests/hermes_cli/test_kanban_core_functionality.py (modified, +266/-0)
tests/plugins/test_kanban_dashboard_plugin.py (modified, +218/-0)
tools/kanban_tools.py (modified, +52/-5)

RAW_BUFFERClick to expand / collapse

Summary

Evidence

Silently failed (tool layer error, credential issue, etc.)
Were hallucinated by the worker (LLM invented plausible-looking IDs)
Were described in prose but never actually executed

Root cause

Impact

Downstream orchestrators and humans see completion summaries claiming remediation work that never happened. Tasks stay broken while looking like they were addressed. This undermines the multi-agent reliability model — you can't trust worker handoffs.

Suggested Fixes (in order of preference)

Add optional created_cards field on kanban_complete() metadata. If a worker claims "created_cards": ["t_A", "t_B"], the kernel verifies each ID exists in the DB and has the completing worker as parent/creator before accepting the completion. Reject with an error message naming the missing IDs.
Post-completion validation hook. After completing, if the summary or metadata contains task ID patterns (regex: t_[a-f0-9]+), verify those IDs exist. If any are missing, auto-comment on the task warning the human.
Tighten the kanban-worker skill (skills/devops/kanban-worker/SKILL.md): require workers to verify kanban_create() return values before naming card IDs in completion. Add a rule: "Never claim you created a card unless you have a successful return value containing the ID."

Option 1 is the strongest — structural verification beats prose promises.

Environment

Hermes Agent v0.11.0 (a7fb79efb)

extent analysis

TL;DR

Implementing validation for kanban_complete() calls to verify the existence of claimed remediation card IDs in the database is the most likely fix.

Guidance

Add an optional created_cards field to kanban_complete() metadata to verify each claimed ID exists in the database before accepting completion.
Consider implementing a post-completion validation hook to check for task ID patterns in the summary or metadata and verify their existence.
Review and tighten the kanban-worker skill to require verification of kanban_create() return values before claiming card creation.

Example

{
  "metadata": {
    "created_cards": ["t_A", "t_B"]
  }
}

This example shows how the created_cards field could be added to the kanban_complete() metadata to enable verification of claimed card IDs.

Notes

The suggested fixes assume that the kanban_create() function returns a unique ID for each created card, and that this ID can be used to verify the card's existence in the database.

Recommendation

Apply workaround by implementing the first suggested fix: add an optional created_cards field on kanban_complete() metadata to verify each claimed ID exists in the database. This approach provides structural verification and is the strongest of the suggested fixes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#model download #tokenizer error #prompt formatting #chain error #conversation history

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix kanban workers fabricate card IDs in complete() — no verification that claimed cards exist [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #20022: fix(kanban): validate created card handoffs

Description (problem / solution / changelog)

Summary

Scope

Verification

Changed files

PR #20232: feat(kanban): hallucination gate + recovery UX for worker-created-card claims (closes #20017)

Description (problem / solution / changelog)

What changed

Validation

Honest caveat on scope

Changed files

Summary

Evidence

Root cause

Impact

Suggested Fixes (in order of preference)

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING