hermes - 💡(How to fix) Fix [Bug]: Kanban parent-child handoff: scratch workspace GC destroys artifacts before child can read them

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a Kanban task A (with workspace_kind=scratch) completes and has a child task B linked via parents=[A], the child task B starts and finds that A's scratch workspace has already been garbage-collected — all files A wrote as handoff artifacts are gone.

The dependency chain (parents=[A]) should guarantee that B can read A's output, but the immediate _cleanup_workspace() on A's kanban_complete() destroys the workspace before the dispatcher promotes B to ready and spawns it.

Root Cause

kanban_complete() in hermes_cli/kanban_db.py calls _cleanup_workspace() synchronously for scratch workspaces:

# complete_task(...)
_cleanup_workspace(conn, task_id)

The cleanup function removes the entire workspace directory immediately on completion:

if kind != "scratch" or not path:
    return

wp = Path(path)
if wp.is_dir():
    shutil.rmtree(wp, ignore_errors=True)

There is no check for "does this task have children that still need this workspace". The dependency engine handles task status promotion (parents done → child ready), but there is no workspace lifecycle that extends beyond the completing parent.

Fix Action

Fix / Workaround

The dependency chain (parents=[A]) should guarantee that B can read A's output, but the immediate _cleanup_workspace() on A's kanban_complete() destroys the workspace before the dispatcher promotes B to ready and spawns it.

  1. Create a Kanban task A with workspace_kind=scratch (default)
  2. Worker A writes files into the workspace (e.g., summary.txt, guide.md)
  3. Worker A calls kanban_complete(summary=..., metadata={"artifacts": [...]})
  4. Create Kanban task B with parents=[A]
  5. Dispatcher promotes B to ready after A is done
  6. Worker B starts and tries to read $HERMES_KANBAN_WORKSPACE — it's gone

Code Example

# complete_task(...)
_cleanup_workspace(conn, task_id)

---

if kind != "scratch" or not path:
    return

wp = Path(path)
if wp.is_dir():
    shutil.rmtree(wp, ignore_errors=True)
RAW_BUFFERClick to expand / collapse

Summary

When a Kanban task A (with workspace_kind=scratch) completes and has a child task B linked via parents=[A], the child task B starts and finds that A's scratch workspace has already been garbage-collected — all files A wrote as handoff artifacts are gone.

The dependency chain (parents=[A]) should guarantee that B can read A's output, but the immediate _cleanup_workspace() on A's kanban_complete() destroys the workspace before the dispatcher promotes B to ready and spawns it.

Impact

  • Child workers crash with FileNotFoundError on the workspace directory or $HERMES_KANBAN_WORKSPACE points to a deleted path
  • Workers silently fall back to re-extracting data from source URLs, duplicating work
  • In multi-step pipelines, each step wastes tokens re-fetching/re-deriving parent output
  • Hard to diagnose: the log shows "cannot access '/path/to/workspace/': No such file or directory" with no explicit mention of GC

Root Cause

kanban_complete() in hermes_cli/kanban_db.py calls _cleanup_workspace() synchronously for scratch workspaces:

# complete_task(...)
_cleanup_workspace(conn, task_id)

The cleanup function removes the entire workspace directory immediately on completion:

if kind != "scratch" or not path:
    return

wp = Path(path)
if wp.is_dir():
    shutil.rmtree(wp, ignore_errors=True)

There is no check for "does this task have children that still need this workspace". The dependency engine handles task status promotion (parents done → child ready), but there is no workspace lifecycle that extends beyond the completing parent.

Steps to Reproduce

  1. Create a Kanban task A with workspace_kind=scratch (default)
  2. Worker A writes files into the workspace (e.g., summary.txt, guide.md)
  3. Worker A calls kanban_complete(summary=..., metadata={"artifacts": [...]})
  4. Create Kanban task B with parents=[A]
  5. Dispatcher promotes B to ready after A is done
  6. Worker B starts and tries to read $HERMES_KANBAN_WORKSPACE — it's gone

Evidence

Observed worker output when the parent scratch workspace is already GC'd:

The parent task's workspace has been cleaned up (it's a scratch workspace that gets GC'd). But the parent task's handoff tells me the artifacts were created. Since the files are gone (scratch GC'd), I need to re-extract from the source URL and get the content directly.

Proposed Fix — 3 Options

Option A — Deferred cleanup for linked parents: _cleanup_workspace() should check via the task_links table whether the completing task has children still pending/ready/running. If so, defer the shutil.rmtree() until all children are also complete or archived. A background janitor can sweep orphaned scratch workspaces.

Option B — Inherited scratch workspace for linked tasks: When a task is created with parents=[...], inherit the parent's scratch workspace path. Children read/write from the same physical directory. Cleanup happens only when the last linked task in the chain completes.

Option C — Document the constraint: Update the kanban-worker SKILL.md to explicitly state that filesystem handoffs between parent/child scratch tasks are NOT supported. Artifacts must be passed via kanban_complete(metadata={...}) (inline data) or workers must use dir:<path> workspaces for persistent file sharing.

Related Issues

  • #28818 — Kanban scratch workspace cleanup can delete real source directories (same _cleanup_workspace() code path, different trigger)
  • PR #31708 — Fixed the data-loss vector (deleting non-Hermes directories) but did NOT address the premature-cleanup vector (deleting the managed scratch dir while children still need it)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING