hermes - ✅(Solved) Fix [Bug]: harness kanban CLI invoked from agent session ignores active-board pin, races current file with concurrent boards switch [2 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20074Fetched 2026-05-06 06:38:54
View on GitHub
Comments
3
Participants
3
Timeline
11
Reactions
0
Timeline (top)
labeled ×4commented ×3cross-referenced ×2closed ×1

Root Cause

kanban_db.connect() resolves the board via HERMES_KANBAN_BOARD env var → <root>/kanban/current file → default. The two surfaces resolve differently:

  • kanban_* tools run inside the agent process, where HERMES_KANBAN_BOARD is set (either by the dispatcher when spawning a worker, or by the user's shell when launching harness -p <profile> chat). They reliably hit the right board.
  • harness kanban … shelled from within an agent session is a fresh subprocess. It inherits the parent shell's env. If HERMES_KANBAN_BOARD wasn't set in that env (common — most users don't export it; they use harness kanban boards switch <slug> which just writes to the current file), the CLI falls back to the current file.
  • The current file is global state. Any other concurrent harness session can flip it via harness kanban boards switch …. When that happens, the orchestrator session's tool calls keep targeting the original board (env-pinned), but the orchestrator's harness kanban … shell calls suddenly target the new board.

Fix Action

Fix / Workaround

  • kanban_* tools run inside the agent process, where HERMES_KANBAN_BOARD is set (either by the dispatcher when spawning a worker, or by the user's shell when launching harness -p <profile> chat). They reliably hit the right board.
  • harness kanban … shelled from within an agent session is a fresh subprocess. It inherits the parent shell's env. If HERMES_KANBAN_BOARD wasn't set in that env (common — most users don't export it; they use harness kanban boards switch <slug> which just writes to the current file), the CLI falls back to the current file.
  • The current file is global state. Any other concurrent harness session can flip it via harness kanban boards switch …. When that happens, the orchestrator session's tool calls keep targeting the original board (env-pinned), but the orchestrator's harness kanban … shell calls suddenly target the new board.

Worker sessions don't hit this because the dispatcher sets HERMES_KANBAN_BOARD in the spawned child's env directly (kanban_db.py:2593-2623), so even shell calls inherit the right pin.

Workaround in the meantime

PR fix notes

PR #20094: fix(cli): pin HERMES_KANBAN_BOARD at chat boot to stop subprocess board drift

Description (problem / solution / changelog)

What does this PR do?

In cmd_chat, pin the resolved kanban board into HERMES_KANBAN_BOARD once at boot when the env var isn't already set. Without this, in-process kanban_* tools and shelled-out hermes kanban … subprocesses resolve the board on different paths — env-var pin if present, otherwise the global <root>/kanban/current file. A concurrent hermes kanban boards switch from another session can flip that file mid-turn, so the same chat ends up routing tool calls to board A while its shell calls hit board B, surfacing as phantom "no such task" errors when an orchestrator immediately tries to harness kanban show <id> after kanban_create.

This mirrors the behaviour the dispatcher already has for spawned workers (hermes_cli/kanban_db.py:2622-2623) — pin the board into the child env so every code path resolves the same DB. Idempotent and a no-op when the caller has already exported the env (e.g. dispatcher-spawned worker, ad-hoc shell pin).

Related Issue

Fixes #20074

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • hermes_cli/main.py: add _pin_kanban_board_env() helper and call it from cmd_chat after the existing env-flag block, before forking to TUI vs CLI.
  • tests/hermes_cli/test_pin_kanban_board_env.py: cover the three branches — pin when unset, no-op when already set, swallow get_current_board() failures (boot must never crash because the kanban dir is missing or unreadable).

How to Test

  1. harness kanban boards switch space (Terminal 1)
  2. harness -p space-orchestrator chat (Terminal 1) — leave it sitting at the prompt.
  3. harness kanban boards switch harness-facet (Terminal 2, concurrent).
  4. Back in Terminal 1's chat, ask the orchestrator to kanban_create a task and then immediately shell harness kanban show <id>. Before the fix, the shell-out lands on harness-facet and reports "no such task". After the fix, both calls resolve to space because HERMES_KANBAN_BOARD=space was pinned at chat boot.
  5. Unit tests: pytest tests/hermes_cli/test_pin_kanban_board_env.py -v (3/3 pass).

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass (15 pre-existing failures in gateway/web_server/bedrock unrelated to this change; full hermes_cli suite green except those)
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 14.2 (Darwin 23.2.0)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A (helper docstring covers the rationale; no user-facing surface added)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — env-pin is platform-agnostic; mirrors existing dispatcher behaviour.
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Changed files

  • hermes_cli/main.py (modified, +22/-0)
  • tests/hermes_cli/test_pin_kanban_board_env.py (added, +54/-0)

PR #20186: fix(cli): pin HERMES_KANBAN_BOARD at chat boot to stop subprocess board drift (salvages #20094)

Description (problem / solution / changelog)

Chat sessions now pin HERMES_KANBAN_BOARD into the process env at boot, so in-process kanban_* tools and shelled-out hermes kanban … subprocesses always resolve to the same board even if a concurrent session runs hermes kanban boards switch mid-turn.

Salvaged from #20094 (@0xDevNinja).

Root cause: kanban_db.connect() resolves the active board via a two-source chain — HERMES_KANBAN_BOARD env var, then the global <root>/kanban/current file. Agent tool calls run in the chat process where env may be set; shell-outs spawn fresh subprocesses with no env inherited, so they fall through to the current file. The current file is global mutable state — another session's boards switch flips it — so mid-turn the same chat routes tool calls to board A while its shell calls hit board B. Symptom: kanban_create returns a task id, immediately followed by hermes kanban show <id> reporting "no such task."

Mirrors the pin the dispatcher already does for spawned workers (kanban_db.py:2622-2623), so every code path resolves the same DB consistently.

Changes

  • hermes_cli/main.py: new _pin_kanban_board_env() helper, called from cmd_chat after the existing env-flag block, before forking to TUI vs CLI. Idempotent — no-op if HERMES_KANBAN_BOARD is already set; swallows get_current_board() failures so chat boot never crashes because the kanban dir is unreadable.
  • tests/hermes_cli/test_pin_kanban_board_env.py: three tests covering pin-when-unset, no-op-when-set, and swallow-on-failure. Autouse fixture snapshots and restores HERMES_KANBAN_BOARD around each test (the helper writes to os.environ directly, bypassing monkeypatch tracking — without the fixture the mutation leaked into TestSharedBoardPaths in the same suite).

Validation

BeforeAfter
Concurrent boards switch mid-turnkanban_create and hermes kanban show <id> hit different boards → "no such task"both resolve to the chat's boot-time board
HERMES_KANBAN_BOARD already setunchangedunchanged (idempotent no-op)
Kanban dir missing/unreadablecould crash chat bootswallowed, chat proceeds
Targeted tests3/3 in new test file, 339/339 across full kanban suite

Includes a small follow-up commit fixing env-var leakage in the test file — the original test used monkeypatch.delenv but the helper writes os.environ[...] directly, so the mutation survived teardown and broke 9 unrelated tests in the same suite. Added an autouse snapshot/restore fixture.

Closes #20074

Co-authored-by: 0xDevNinja [email protected]

Changed files

  • hermes_cli/main.py (modified, +22/-0)
  • tests/hermes_cli/test_pin_kanban_board_env.py (added, +75/-0)

Code Example

> kanban_create(title="...", assignee="space-pipeline", ...)
  → returns {"task_id": "t_abc123", "status": "ok"}

> harness kanban show t_abc123
  no such task: t_abc123

---

# Terminal 1
harness kanban boards switch space
harness -p space-orchestrator chat
# (in chat) /goal Drive the space board: ... [orchestrator tool-creates a task]

# Terminal 2 (concurrent — e.g. another session, a script, a teammate)
harness kanban boards switch harness-facet

# Back in Terminal 1's orchestrator session, in the same goal turn:
# Tool: kanban_create returns t_xyz successfully (HERMES_KANBAN_BOARD env was
# set when the chat process spawned, persists for tool calls)
# Shell: harness kanban show t_xyz → "no such task: t_xyz"
#         (no env, reads current file, sees harness-facet, looks in wrong DB)
RAW_BUFFERClick to expand / collapse

Problem

When an agent session uses both the kanban_* tools AND shells out to harness kanban … CLI in the same turn, tasks created via tools can become invisible to subsequent CLI invocations. Symptom from a real orchestrator session:

> kanban_create(title="...", assignee="space-pipeline", ...)
  → returns {"task_id": "t_abc123", "status": "ok"}

> harness kanban show t_abc123
  no such task: t_abc123

The task DOES exist — direct SQLite probe of the right per-board DB confirms it. The CLI is reading a different board's DB.

Root cause

kanban_db.connect() resolves the board via HERMES_KANBAN_BOARD env var → <root>/kanban/current file → default. The two surfaces resolve differently:

  • kanban_* tools run inside the agent process, where HERMES_KANBAN_BOARD is set (either by the dispatcher when spawning a worker, or by the user's shell when launching harness -p <profile> chat). They reliably hit the right board.
  • harness kanban … shelled from within an agent session is a fresh subprocess. It inherits the parent shell's env. If HERMES_KANBAN_BOARD wasn't set in that env (common — most users don't export it; they use harness kanban boards switch <slug> which just writes to the current file), the CLI falls back to the current file.
  • The current file is global state. Any other concurrent harness session can flip it via harness kanban boards switch …. When that happens, the orchestrator session's tool calls keep targeting the original board (env-pinned), but the orchestrator's harness kanban … shell calls suddenly target the new board.

Concrete reproducer

# Terminal 1
harness kanban boards switch space
harness -p space-orchestrator chat
# (in chat) /goal Drive the space board: ... [orchestrator tool-creates a task]

# Terminal 2 (concurrent — e.g. another session, a script, a teammate)
harness kanban boards switch harness-facet

# Back in Terminal 1's orchestrator session, in the same goal turn:
# Tool: kanban_create returns t_xyz successfully (HERMES_KANBAN_BOARD env was
# set when the chat process spawned, persists for tool calls)
# Shell: harness kanban show t_xyz → "no such task: t_xyz"
#         (no env, reads current file, sees harness-facet, looks in wrong DB)

Why it bites orchestrators specifically

Orchestrator personas often need to combine:

  • Tool-call ops for atomic mutations (kanban_create, kanban_complete)
  • CLI-shell ops for enumeration verbs not yet exposed as tools (harness kanban list, harness kanban runs, harness kanban archive) — see #20048

So orchestrator sessions are the workload most likely to mix both surfaces in the same turn, which makes them the workload most likely to trip on this divergence.

Worker sessions don't hit this because the dispatcher sets HERMES_KANBAN_BOARD in the spawned child's env directly (kanban_db.py:2593-2623), so even shell calls inherit the right pin.

Proposed fix

When a chat session activates a profile that has kanban in its toolsets, set HERMES_KANBAN_BOARD in the child shell environment to the resolved board at chat-start time. Three implementation options:

  1. At chat boot: cli.py (or wherever HermesCLI.__init__ finalizes the profile env) reads the active board via kanban_db.get_current_board() once and exports HERMES_KANBAN_BOARD into os.environ for the rest of the session. Subsequent shell-outs inherit it. ~5 LOC.

  2. At kanban-toolset registration: When the kanban toolset is enabled (via _check_kanban_mode()), pin the board to env. Same effect, narrower trigger.

  3. In the terminal tool's env-passthrough path: When HERMES_KANBAN_BOARD is set in the agent process env, propagate it to spawned subprocess. (May already happen — needs verification.)

I'd lean toward (1): one-time pin at session start, before any tool registers. Idempotent, easy to test.

Why "test" / "guess this might be the cache invalidation" was wrong

The orchestrator that originally surfaced this called it a "DB-handle caching thing." After investigation it's actually nothing to do with DB handles or caches — it's two different code paths resolving the board differently, with the current file being mutable global state that one of them respects and the other ignores.

Workaround in the meantime

Always pass --board <slug> explicitly to harness kanban invocations from inside an orchestrator session. This is what we now do in the space-orchestrator SOUL.md addendum and the space-kanban-workflow skill. Verbose but reliable.

Discovery context

Hit this while running an autonomous orchestrator /goal on the v0.12.0 release with multiple boards (space, harness-facet, surface, default) on the same install. The orchestrator successfully created task t_04086c86 via kanban_create tool, then immediately tried to harness kanban show t_04086c86 and got "no such task" because the active board had drifted to harness-facet between calls (a different chat session was running on Daniel's other monitor).

Workaround proven working: prepend --board space to every CLI call.

Affected component

CLI / agent-CLI boundary

Severity

P2 — the workaround (always-explicit --board) is documentable and reliable, but the gap is subtle and the failure mode is "looks like a phantom data loss bug" which is hard to diagnose for users without filesystem access.

extent analysis

TL;DR

Set HERMES_KANBAN_BOARD environment variable in the child shell environment to the resolved board at chat-start time to ensure consistency between tool calls and CLI shell-outs.

Guidance

  • Identify the active board via kanban_db.get_current_board() at chat boot and export HERMES_KANBAN_BOARD into os.environ for the rest of the session.
  • Alternatively, pin the board to env when the kanban toolset is enabled or propagate HERMES_KANBAN_BOARD from the agent process env to spawned subprocesses.
  • As a temporary workaround, pass --board <slug> explicitly to harness kanban invocations from inside an orchestrator session.

Example

# At chat boot, set HERMES_KANBAN_BOARD environment variable
import os
from kanban_db import get_current_board

board = get_current_board()
os.environ['HERMES_KANBAN_BOARD'] = board

Notes

The proposed fix assumes that setting HERMES_KANBAN_BOARD at chat-start time will ensure consistency between tool calls and CLI shell-outs. However, the implementation details may vary depending on the specific codebase and requirements.

Recommendation

Apply the workaround by passing --board <slug> explicitly to harness kanban invocations from inside an orchestrator session, as this is a reliable and documentable solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: harness kanban CLI invoked from agent session ignores active-board pin, races current file with concurrent boards switch [2 pull requests, 3 comments, 3 participants]