hermes - ✅(Solved) Fix [BUG] Agent marks Kanban task as DONE but cron was never created [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#25288Fetched 2026-05-14 03:47:31
View on GitHub
Comments
0
Participants
1
Timeline
11
Reactions
0
Author
Participants
Timeline (top)
labeled ×4referenced ×4cross-referenced ×3

Error Message

  • Hit an error but didn't report it to the user
  • If a cron creation fails, the error must be surfaced to the user
  1. Error surfacing: Any failure to create cron must be explicitly reported to user, not silently swallowed

Root Cause

This is a behavioral pattern, not a technical bug. Agents are:

  1. Not verifying their work actually succeeded
  2. Reporting completion without doing the work
  3. Not surfacing errors to the user
  4. Treating "move to done" as the task completion rather than the actual work being done

Fix Action

Fixed

PR fix notes

PR #25328: fix(kanban): verify created artifacts before marking task done (#25288)

Description (problem / solution / changelog)

What does this PR do?

Stops Kanban workers from marking tasks as done while the cron job (or other artifact) they claim to have created does not actually exist. Fixes the silent-success path described in #25288.

Three small additions wired together:

  1. Kernel gate (hermes_cli/kanban_db.py) — complete_task gains a created_artifacts=[{kind, id}] parameter parallel to the existing created_cards gate. kind="cron" is verified via cron.jobs.get_job(id); phantom entries raise HallucinatedArtifactsError + audit event, task stays running so the worker can fix the claim and retry. Unknown kinds land on an advisory bucket so plugins can ship new kinds ahead of their verifier.

  2. Cron idempotency (cron/jobs.py) — create_job gains an optional idempotency_key; same key returns the existing job with reused=true instead of duplicating. New find_jobs() lookup helper used by the kernel verifier.

  3. Tool surface + agent teachingkanban_complete schema adds created_artifacts with a structured retry-friendly tool_error on rejection. cronjob schema adds idempotency_key. KANBAN_GUIDANCE rules 5a + 5b (compressed, +530 chars over baseline) and the kanban-worker SKILL pick up the new pattern with GOOD/BAD examples.

Related Issue

Fixes #25288

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • cron/jobs.pyidempotency_key on create_job + new find_jobs(idempotency_key=…, name=…) helper.
  • hermes_cli/kanban_db.py — new HallucinatedArtifactsError, ARTIFACT_VERIFIERS registry (today: cron), _normalize_artifacts, _verify_created_artifacts, plus created_artifacts=… on complete_task.
  • tools/kanban_tools.pykanban_complete schema + handler thread created_artifacts through and translate the new error into a structured tool_error (still in-flight, retry hint, three-option decision tree).
  • tools/cronjob_tools.pycronjob action='create' threads idempotency_key through; response now includes reused: bool.
  • agent/prompt_builder.pyKANBAN_GUIDANCE rules 5a/5b + Do-NOT entry; compressed to fit cached-prompt budget.
  • tests/tools/test_kanban_artifact_gate.py33 new tests covering the gate truth-table, complete_task end-to-end (incl. exact #25288 reproduction), tool error shape, cron idempotency, and prompt teaching anchors.
  • tests/tools/test_kanban_tools.py — bumped KANBAN_GUIDANCE size guard from 4096 → 5120 with an explanatory docstring.
  • skills/devops/kanban-worker/SKILL.md — two new sections mirroring the existing created_cards pattern.

Backwards compatible: every new field defaults to None, every existing call site keeps working, no schema migration.

How to Test

# New gate suite (33 tests, ~1.3s, no real Chromium / cron daemon)
./scripts/run_tests.sh tests/tools/test_kanban_artifact_gate.py

# Full regression across kanban + cron + tools (excluding two pre-existing
# sandbox-env failures unrelated to this PR — confirmed unchanged on main)
./scripts/run_tests.sh tests/tools/test_kanban_artifact_gate.py \
    tests/tools/test_kanban_tools.py tests/tools/test_cronjob_tools.py \
    tests/cron/test_jobs.py -k "not test_loads_cursor_rules_mdc"
# expected: 192 passed

End-to-end behaviour after the fix:

> agent: kanban_complete(summary="Created cron ghost123 to monitor h13b",
                         created_artifacts=[{"kind": "cron", "id": "ghost123"}])
< tool_error: kanban_complete blocked: created_artifacts could not be verified:
              cron=ghost123 (no cron job with this id exists). Your task is still
              in-flight (no state change). Either (a) actually create the missing
              artifact and retry with the real id, (b) drop the bogus entry, or
              (c) call kanban_block. Do NOT mark the task done while the artifact
              is missing — that's the failure mode reported in #25288.

The agent's correct response (now taught by the prompt + SKILL):

> agent: cronjob(action="create", prompt="...", schedule="every 30m",
                 idempotency_key=$HERMES_KANBAN_TASK)
< {"success": true, "job_id": "abc123def456", "reused": false, ...}
> agent: kanban_complete(summary="Created cron abc123def456 to monitor h13b",
                         created_artifacts=[{"kind": "cron", "id": "abc123def456"}])
< {"ok": true, ...}   # task → done; completed event records verified_artifacts

A retry of the cron creation with the same idempotency_key returns reused: true instead of duplicating the job — fixes the second half of the post-mortem.

Checklist

  • Conventional Commits (feat(cron):, fix(kanban):, docs(kanban):, test(kanban):)
  • 4 focused commits, single author
  • 33 new tests pass; 192 existing kanban/cron/tool tests pass; 2 unrelated pre-existing failures confirmed unchanged on main
  • Tested on macOS 15.6 (darwin 24.6.0)
  • Updated KANBAN_GUIDANCE, kanban-worker SKILL, and tool schema descriptions
  • No new config keys, no architecture change, no platform-specific calls

Changed files

  • agent/prompt_builder.py (modified, +18/-0)
  • cron/jobs.py (modified, +69/-0)
  • hermes_cli/kanban_db.py (modified, +195/-0)
  • skills/devops/kanban-worker/SKILL.md (modified, +58/-0)
  • tests/tools/test_kanban_artifact_gate.py (added, +597/-0)
  • tests/tools/test_kanban_tools.py (modified, +8/-3)
  • tools/cronjob_tools.py (modified, +70/-16)
  • tools/kanban_tools.py (modified, +107/-0)

PR #1: fix(kanban): prevent false task completion with QC verifier scoring gate

Description (problem / solution / changelog)

Problem

Workers completing Kanban tasks had no quality gate — any task that called kanban_complete immediately transitioned to done regardless of output quality, correctness, or completeness. This allowed false task completion to go undetected.

Refs: #25288, #21925

Solution

A new qc_review status sits between running/ready and done. Tasks created with require_qc=true enter qc_review upon completion instead of done. A verifier (agent or human) calls kanban_review to approve or reject.

Database layer

  • VALID_STATUSES: Added qc_review
  • Schema migration (auto, safe for existing DBs): require_qc (INTEGER), qc_threshold (REAL), qc_score (REAL), rework_count (INTEGER), last_qc_feedback (TEXT)
  • create_task(): Accepts require_qc (bool) and qc_threshold (float)
  • complete_task(): Reads require_qc before write txn; transitions to qc_review instead of done when QC is required; emits completed_awaiting_qc event
  • qc_approve(): qc_review → done with score + feedback
  • qc_reject(): qc_review → ready (or → blocked after 3 rework cycles) with feedback

Tool layer

  • kanban_review (new orchestrator tool): approve/reject with score (0.0–1.0) and feedback
  • kanban_create: New require_qc + qc_threshold params
  • kanban_complete: Returns status: qc_review in OK response when QC gate engaged
  • All list/show tools surface QC fields

Notification layer

  • Gateway watcher subscribes to completed_awaiting_qc events
  • Renders: 🔍 Kanban {id} done — awaiting QC review (threshold: 0.7)

Usage

# Create a task that requires QC
kanban_create(title="verify login flow", assignee="dev", require_qc=true, qc_threshold=0.8)

# ... worker completes, task enters qc_review instead of done ...

# QC verifier approves
kanban_review(task_id="t_abc123", approve=true, score=0.9, feedback="All checks pass")

# Or rejects for rework
kanban_review(task_id="t_abc123", approve=false, score=0.4, feedback="Missing error handling on line 42")

Backward Compatibility

  • Existing tasks without require_qc behave exactly as before (straight to done)
  • All new columns have defaults (NULL/0) — zero migration burden

Changed files

  • gateway/run.py (modified, +19/-1)
  • hermes_cli/kanban_db.py (modified, +174/-8)
  • tools/kanban_tools.py (modified, +157/-2)

PR #25356: fix(kanban): prevent false task completion with QC verifier scoring gate

Description (problem / solution / changelog)

Problem

Workers completing Kanban tasks had no quality gate — any task that called kanban_complete immediately transitioned to done regardless of output quality, correctness, or completeness. This allowed false task completion to go undetected.

Refs: #25288, #21925

Solution

A new qc_review status sits between running/ready and done. Tasks created with require_qc=true enter qc_review upon completion instead of done. A verifier (agent or human) calls kanban_review to approve or reject.

Database layer

  • VALID_STATUSES: Added qc_review
  • Schema migration (auto, safe for existing DBs): require_qc (INTEGER), qc_threshold (REAL), qc_score (REAL), rework_count (INTEGER), last_qc_feedback (TEXT)
  • create_task(): Accepts require_qc (bool) and qc_threshold (float)
  • complete_task(): Reads require_qc before write txn; transitions to qc_review instead of done when QC is required; emits completed_awaiting_qc event
  • qc_approve(): qc_review → done with score + feedback
  • qc_reject(): qc_review → ready (or → blocked after 3 rework cycles) with feedback

Tool layer

  • kanban_review (new orchestrator tool): approve/reject with score (0.0–1.0) and feedback
  • kanban_create: New require_qc + qc_threshold params
  • kanban_complete: Returns status: qc_review in OK response when QC gate engaged
  • All list/show tools surface QC fields

Notification layer

  • Gateway watcher subscribes to completed_awaiting_qc events
  • Renders: 🔍 Kanban {id} done — awaiting QC review (threshold: 0.7)

Backward Compatibility

  • Existing tasks without require_qc behave exactly as before (straight to done)
  • All new columns have defaults (NULL/0) — zero migration burden

Changed files

  • gateway/run.py (modified, +19/-1)
  • hermes_cli/kanban_db.py (modified, +174/-8)
  • tools/kanban_tools.py (modified, +157/-2)
RAW_BUFFERClick to expand / collapse

Problem

Agents mark Kanban tasks as complete/dONE even when the actual work was never done. A task was assigned multiple times to create a cron job. Each time the agent claimed it was complete and moved the task to done. When checked, the cron does not exist in Hermes at all.

Post-Mortem: Task t_93298a9e

  1. Task created: "make a cron task in http://127.0.0.1:9000/cron to monitor every 30 mins the h13b to see how its' doing"
  2. Agent was asked to create the cron — multiple attempts over several sessions
  3. Each time, agent reported it was complete and moved task to DONE
  4. Task now shows status "done" in the Kanban board
  5. Actual result: No cron exists in Hermes. The cron was never created.

What Actually Happened

The agent lied about completing the work. It either:

  • Failed to create the cron but reported success anyway
  • Never attempted to create it at all
  • Hit an error but didn't report it to the user

Impact

  • User loses trust in the Kanban workflow — "done" now means nothing
  • Time wasted checking on tasks that claimed to be complete
  • The Kanban board becomes unreliable as a source of truth
  • Critical automation (monitoring, etc.) doesn't get set up

Root Cause

This is a behavioral pattern, not a technical bug. Agents are:

  1. Not verifying their work actually succeeded
  2. Reporting completion without doing the work
  3. Not surfacing errors to the user
  4. Treating "move to done" as the task completion rather than the actual work being done

Expected Behavior

  • If a cron creation fails, the error must be surfaced to the user
  • The task should not be marked complete unless the cron actually exists
  • Agents should verify the cron was created by checking hermes cron list before marking done
  • Silent failures are unacceptable — if something didn't work, the user must know

Suggested Fix

  1. Verification required: Before marking a task done, agent must verify the cron exists via hermes cron list
  2. Error surfacing: Any failure to create cron must be explicitly reported to user, not silently swallowed
  3. Idempotency check: Before creating a cron, check if it already exists (to avoid duplicates on re-runs)

This is a serious trust and reliability issue. The Kanban board is useless if "done" doesn't mean the work was actually completed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING