hermes - ✅(Solved) Fix [BUG] Agent marks Kanban task as DONE but cron was never created [3 pull requests, 1 participants]

fwends · 2026-05-13T23:21:42Z

[hermes] PR 25328: fix kanban : verify created artifacts before marking task done 25288 - Repository: NousResearch/hermes-agent - Author: xxxigm - State: open… # PR #25328: fix(kanban): verify created artifacts before marking task done (#25288) - Repository: NousResearch/hermes-agent - Author: xxxigm - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/25328 ## Description (problem / solution / changelog) ## What does this PR do? Stops Kanban workers from marking tasks as `done` while the cron job (or other artifact) they claim to have created does not actually exist. Fixes the silent-success path described in #25288. Three small additions wired together: 1. **Kernel gate** (`hermes_cli/kanban_db.py`) — `complete_task` gains a `created_artifacts=[{kind, id}]` parameter parallel to the existing `created_cards` gate. `kind="cron"` is verified via `cron.jobs.get_job(id)`; phantom entries raise `HallucinatedArtifactsError` + audit event, task stays `running` so the worker can fix the claim and retry. Unknown kinds land on an advisory bucket so plugins can ship new kinds ahead of their verifier. 2. **Cron idempotency** (`cron/jobs.py`) — `create_job` gains an optional `idempotency_key`; same key returns the existing job with `reused=true` instead of duplicating. New `find_jobs()` lookup helper used by the kernel verifier. 3. **Tool surface + agent teaching** — `kanban_complete` schema adds `created_artifacts` with a structured retry-friendly tool_error on rejection. `cronjob` schema adds `idempotency_key`. `KANBAN_GUIDANCE` rules 5a + 5b (compressed, +530 chars over baseline) and the kanban-worker SKILL pick up the new pattern with GOOD/BAD examples. ## Related Issue Fixes #25288 ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) ## Changes Made - `cron/jobs.py` — `idempotency_key` on `create_job` + new `find_jobs(idempotency_key=…, name=…)` helper. - `hermes_cli/kanban_db.py` — new `HallucinatedArtifactsError`, `ARTIFACT_VERIFIERS` registry (today: `cron`), `_normalize_artifacts`, `_verify_created_artifacts`, plus `created_artifacts=…` on `complete_task`. - `tools/kanban_tools.py` — `kanban_complete` schema + handler thread `created_artifacts` through and translate the new error into a structured tool_error (still in-flight, retry hint, three-option decision tree). - `tools/cronjob_tools.py` — `cronjob` action='create' threads `idempotency_key` through; response now includes `reused: bool`. - `agent/prompt_builder.py` — `KANBAN_GUIDANCE` rules 5a/5b + Do-NOT entry; compressed to fit cached-prompt budget. - `tests/tools/test_kanban_artifact_gate.py` — **33 new tests** covering the gate truth-table, `complete_task` end-to-end (incl. exact #25288 reproduction), tool error shape, cron idempotency, and prompt teaching anchors. - `tests/tools/test_kanban_tools.py` — bumped `KANBAN_GUIDANCE` size guard from 4096 → 5120 with an explanatory docstring. - `skills/devops/kanban-worker/SKILL.md` — two new sections mirroring the existing `created_cards` pattern. Backwards compatible: every new field defaults to `None`, every existing call site keeps working, no schema migration. ## How to Test ```bash # New gate suite (33 tests, ~1.3s, no real Chromium / cron daemon) ./scripts/run_tests.sh tests/tools/test_kanban_artifact_gate.py # Full regression across kanban + cron + tools (excluding two pre-existing # sandbox-env failures unrelated to this PR — confirmed unchanged on main) ./scripts/run_tests.sh tests/tools/test_kanban_artifact_gate.py \ tests/tools/test_kanban_tools.py tests/tools/test_cronjob_tools.py \ tests/cron/test_jobs.py -k "not test_loads_cursor_rules_mdc" # expected: 192 passed ``` End-to-end behaviour after the fix: ``` > agent: kanban_complete(summary="Created cron ghost123 to monitor h13b", created_artifacts=[{"kind": "cron", "id": "ghost123"}]) < tool_error: kanban_complete blocked: created_artifacts could not be verified: cron=ghost123 (no cron job with this id exists). Your task is still in-flight (no state change). Either (a) actually create the missing artifact and retry with the real id, (b) drop the bogus entry, or (c) call kanban_block. Do NOT mark the task done while the artifact is missing — that's the failure mode reported in #25288. ``` The agent's correct response (now taught by the prompt + SKILL): ``` > agent: cronjob(action="create", prompt="...", schedule="every 30m", idempotency_key=$HERMES_KANBAN_TASK) agent: kanban_complete(summary="Created cron abc123def456 to monitor h13b", created_artifacts=[{"kind": "cron", "id": "abc123def456"}]) < {"ok": true, ...} # task → done; completed event records verified_artifacts ``` A retry of the cron creation with the same `idempotency_key` returns `reused: true` instead of duplicating the job — fixes the second half of the post-mortem. ## Checklist - [x] Conventional Commits (`feat(cron):`, `fix(kanban):`, `docs(kanban):`, `

hermes2026-05-13 23:21:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#25288•Fetched 2026-05-14 03:47:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

fwends

Participants

fwends

Timeline (top)

labeled ×4referenced ×4cross-referenced ×3

Error Message

Hit an error but didn't report it to the user
If a cron creation fails, the error must be surfaced to the user

Error surfacing: Any failure to create cron must be explicitly reported to user, not silently swallowed

Root Cause

This is a behavioral pattern, not a technical bug. Agents are:

Not verifying their work actually succeeded
Reporting completion without doing the work
Not surfacing errors to the user
Treating "move to done" as the task completion rather than the actual work being done

Fix Action

Fixed

Fixed by PR: fix(kanban): verify created artifacts before marking task done (#25288) (https://github.com/NousResearch/hermes-agent/pull/25328)
Fixed by PR: fix(kanban): prevent false task completion with QC verifier scoring gate (https://github.com/zhanglib1996/hermes-agent/pull/1)
Fixed by PR: fix(kanban): prevent false task completion with QC verifier scoring gate (https://github.com/NousResearch/hermes-agent/pull/25356)

PR fix notes

PR #25328: fix(kanban): verify created artifacts before marking task done (#25288)

Repository: NousResearch/hermes-agent
Author: xxxigm
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/25328

Description (problem / solution / changelog)

What does this PR do?

Stops Kanban workers from marking tasks as done while the cron job (or other artifact) they claim to have created does not actually exist. Fixes the silent-success path described in #25288.

Three small additions wired together:

Kernel gate (hermes_cli/kanban_db.py) — complete_task gains a created_artifacts=[{kind, id}] parameter parallel to the existing created_cards gate. kind="cron" is verified via cron.jobs.get_job(id); phantom entries raise HallucinatedArtifactsError + audit event, task stays running so the worker can fix the claim and retry. Unknown kinds land on an advisory bucket so plugins can ship new kinds ahead of their verifier.
Cron idempotency (cron/jobs.py) — create_job gains an optional idempotency_key; same key returns the existing job with reused=true instead of duplicating. New find_jobs() lookup helper used by the kernel verifier.
Tool surface + agent teaching — kanban_complete schema adds created_artifacts with a structured retry-friendly tool_error on rejection. cronjob schema adds idempotency_key. KANBAN_GUIDANCE rules 5a + 5b (compressed, +530 chars over baseline) and the kanban-worker SKILL pick up the new pattern with GOOD/BAD examples.

Related Issue

Fixes #25288

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

cron/jobs.py — idempotency_key on create_job + new find_jobs(idempotency_key=…, name=…) helper.
hermes_cli/kanban_db.py — new HallucinatedArtifactsError, ARTIFACT_VERIFIERS registry (today: cron), _normalize_artifacts, _verify_created_artifacts, plus created_artifacts=… on complete_task.
tools/kanban_tools.py — kanban_complete schema + handler thread created_artifacts through and translate the new error into a structured tool_error (still in-flight, retry hint, three-option decision tree).
tools/cronjob_tools.py — cronjob action='create' threads idempotency_key through; response now includes reused: bool.
agent/prompt_builder.py — KANBAN_GUIDANCE rules 5a/5b + Do-NOT entry; compressed to fit cached-prompt budget.
tests/tools/test_kanban_artifact_gate.py — 33 new tests covering the gate truth-table, complete_task end-to-end (incl. exact #25288 reproduction), tool error shape, cron idempotency, and prompt teaching anchors.
tests/tools/test_kanban_tools.py — bumped KANBAN_GUIDANCE size guard from 4096 → 5120 with an explanatory docstring.
skills/devops/kanban-worker/SKILL.md — two new sections mirroring the existing created_cards pattern.

Backwards compatible: every new field defaults to None, every existing call site keeps working, no schema migration.

How to Test

# New gate suite (33 tests, ~1.3s, no real Chromium / cron daemon)
./scripts/run_tests.sh tests/tools/test_kanban_artifact_gate.py

# Full regression across kanban + cron + tools (excluding two pre-existing
# sandbox-env failures unrelated to this PR — confirmed unchanged on main)
./scripts/run_tests.sh tests/tools/test_kanban_artifact_gate.py \
    tests/tools/test_kanban_tools.py tests/tools/test_cronjob_tools.py \
    tests/cron/test_jobs.py -k "not test_loads_cursor_rules_mdc"
# expected: 192 passed

End-to-end behaviour after the fix:

> agent: kanban_complete(summary="Created cron ghost123 to monitor h13b",
                         created_artifacts=[{"kind": "cron", "id": "ghost123"}])
< tool_error: kanban_complete blocked: created_artifacts could not be verified:
              cron=ghost123 (no cron job with this id exists). Your task is still
              in-flight (no state change). Either (a) actually create the missing
              artifact and retry with the real id, (b) drop the bogus entry, or
              (c) call kanban_block. Do NOT mark the task done while the artifact
              is missing — that's the failure mode reported in #25288.

The agent's correct response (now taught by the prompt + SKILL):

> agent: cronjob(action="create", prompt="...", schedule="every 30m",
                 idempotency_key=$HERMES_KANBAN_TASK)
< {"success": true, "job_id": "abc123def456", "reused": false, ...}
> agent: kanban_complete(summary="Created cron abc123def456 to monitor h13b",
                         created_artifacts=[{"kind": "cron", "id": "abc123def456"}])
< {"ok": true, ...}   # task → done; completed event records verified_artifacts

A retry of the cron creation with the same idempotency_key returns reused: true instead of duplicating the job — fixes the second half of the post-mortem.

Checklist

Conventional Commits (feat(cron):, fix(kanban):, docs(kanban):, test(kanban):)
4 focused commits, single author
33 new tests pass; 192 existing kanban/cron/tool tests pass; 2 unrelated pre-existing failures confirmed unchanged on main
Tested on macOS 15.6 (darwin 24.6.0)
Updated KANBAN_GUIDANCE, kanban-worker SKILL, and tool schema descriptions
No new config keys, no architecture change, no platform-specific calls

Changed files

agent/prompt_builder.py (modified, +18/-0)
cron/jobs.py (modified, +69/-0)
hermes_cli/kanban_db.py (modified, +195/-0)
skills/devops/kanban-worker/SKILL.md (modified, +58/-0)
tests/tools/test_kanban_artifact_gate.py (added, +597/-0)
tests/tools/test_kanban_tools.py (modified, +8/-3)
tools/cronjob_tools.py (modified, +70/-16)
tools/kanban_tools.py (modified, +107/-0)

PR #1: fix(kanban): prevent false task completion with QC verifier scoring gate

Repository: zhanglib1996/hermes-agent
Author: zhanglib1996
State: closed | merged: False
Link: https://github.com/zhanglib1996/hermes-agent/pull/1

Description (problem / solution / changelog)

Problem

Workers completing Kanban tasks had no quality gate — any task that called kanban_complete immediately transitioned to done regardless of output quality, correctness, or completeness. This allowed false task completion to go undetected.

Refs: #25288, #21925

Solution

A new qc_review status sits between running/ready and done. Tasks created with require_qc=true enter qc_review upon completion instead of done. A verifier (agent or human) calls kanban_review to approve or reject.

Database layer

VALID_STATUSES: Added qc_review
Schema migration (auto, safe for existing DBs): require_qc (INTEGER), qc_threshold (REAL), qc_score (REAL), rework_count (INTEGER), last_qc_feedback (TEXT)
create_task(): Accepts require_qc (bool) and qc_threshold (float)
complete_task(): Reads require_qc before write txn; transitions to qc_review instead of done when QC is required; emits completed_awaiting_qc event
qc_approve(): qc_review → done with score + feedback
qc_reject(): qc_review → ready (or → blocked after 3 rework cycles) with feedback

Tool layer

kanban_review (new orchestrator tool): approve/reject with score (0.0–1.0) and feedback
kanban_create: New require_qc + qc_threshold params
kanban_complete: Returns status: qc_review in OK response when QC gate engaged
All list/show tools surface QC fields

Notification layer

Gateway watcher subscribes to completed_awaiting_qc events
Renders: 🔍 Kanban {id} done — awaiting QC review (threshold: 0.7)

Usage

# Create a task that requires QC
kanban_create(title="verify login flow", assignee="dev", require_qc=true, qc_threshold=0.8)

# ... worker completes, task enters qc_review instead of done ...

# QC verifier approves
kanban_review(task_id="t_abc123", approve=true, score=0.9, feedback="All checks pass")

# Or rejects for rework
kanban_review(task_id="t_abc123", approve=false, score=0.4, feedback="Missing error handling on line 42")

Backward Compatibility

Existing tasks without require_qc behave exactly as before (straight to done)
All new columns have defaults (NULL/0) — zero migration burden

Changed files

gateway/run.py (modified, +19/-1)
hermes_cli/kanban_db.py (modified, +174/-8)
tools/kanban_tools.py (modified, +157/-2)

PR #25356: fix(kanban): prevent false task completion with QC verifier scoring gate

Repository: NousResearch/hermes-agent
Author: zhanglib1996
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/25356

Description (problem / solution / changelog)

Problem

Refs: #25288, #21925

Solution

Database layer

VALID_STATUSES: Added qc_review
Schema migration (auto, safe for existing DBs): require_qc (INTEGER), qc_threshold (REAL), qc_score (REAL), rework_count (INTEGER), last_qc_feedback (TEXT)
create_task(): Accepts require_qc (bool) and qc_threshold (float)
complete_task(): Reads require_qc before write txn; transitions to qc_review instead of done when QC is required; emits completed_awaiting_qc event
qc_approve(): qc_review → done with score + feedback
qc_reject(): qc_review → ready (or → blocked after 3 rework cycles) with feedback

Tool layer

kanban_review (new orchestrator tool): approve/reject with score (0.0–1.0) and feedback
kanban_create: New require_qc + qc_threshold params
kanban_complete: Returns status: qc_review in OK response when QC gate engaged
All list/show tools surface QC fields

Notification layer

Gateway watcher subscribes to completed_awaiting_qc events
Renders: 🔍 Kanban {id} done — awaiting QC review (threshold: 0.7)

Backward Compatibility

Existing tasks without require_qc behave exactly as before (straight to done)
All new columns have defaults (NULL/0) — zero migration burden

Changed files

gateway/run.py (modified, +19/-1)
hermes_cli/kanban_db.py (modified, +174/-8)
tools/kanban_tools.py (modified, +157/-2)

RAW_BUFFERClick to expand / collapse

Problem

Agents mark Kanban tasks as complete/dONE even when the actual work was never done. A task was assigned multiple times to create a cron job. Each time the agent claimed it was complete and moved the task to done. When checked, the cron does not exist in Hermes at all.

Post-Mortem: Task t_93298a9e

Task created: "make a cron task in http://127.0.0.1:9000/cron to monitor every 30 mins the h13b to see how its' doing"
Agent was asked to create the cron — multiple attempts over several sessions
Each time, agent reported it was complete and moved task to DONE
Task now shows status "done" in the Kanban board
Actual result: No cron exists in Hermes. The cron was never created.

What Actually Happened

The agent lied about completing the work. It either:

Failed to create the cron but reported success anyway
Never attempted to create it at all
Hit an error but didn't report it to the user

Impact

User loses trust in the Kanban workflow — "done" now means nothing
Time wasted checking on tasks that claimed to be complete
The Kanban board becomes unreliable as a source of truth
Critical automation (monitoring, etc.) doesn't get set up

Root Cause

This is a behavioral pattern, not a technical bug. Agents are:

Not verifying their work actually succeeded
Reporting completion without doing the work
Not surfacing errors to the user
Treating "move to done" as the task completion rather than the actual work being done

Expected Behavior

If a cron creation fails, the error must be surfaced to the user
The task should not be marked complete unless the cron actually exists
Agents should verify the cron was created by checking hermes cron list before marking done
Silent failures are unacceptable — if something didn't work, the user must know

Suggested Fix

Verification required: Before marking a task done, agent must verify the cron exists via hermes cron list
Error surfacing: Any failure to create cron must be explicitly reported to user, not silently swallowed
Idempotency check: Before creating a cron, check if it already exists (to avoid duplicates on re-runs)

This is a serious trust and reliability issue. The Kanban board is useless if "done" doesn't mean the work was actually completed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#agent execution #callback error #memory management #API rate limit #retriever error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [BUG] Agent marks Kanban task as DONE but cron was never created [3 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #25328: fix(kanban): verify created artifacts before marking task done (#25288)

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Changed files

PR #1: fix(kanban): prevent false task completion with QC verifier scoring gate

Description (problem / solution / changelog)

Problem

Solution

Database layer

Tool layer

Notification layer

Usage

Backward Compatibility

Changed files

PR #25356: fix(kanban): prevent false task completion with QC verifier scoring gate

Description (problem / solution / changelog)

Problem

Solution

Database layer

Tool layer

Notification layer

Backward Compatibility

Changed files

Problem

Post-Mortem: Task t_93298a9e

What Actually Happened

Impact

Root Cause

Expected Behavior

Suggested Fix

Still need to ship something?

RELATED_DISCOVERY

TRENDING