hermes - ✅(Solved) Fix Kanban dispatcher should validate assignee profile readiness before spawning workers [4 pull requests, 1 participants]

hermes2026-05-05 04:15:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#20054•Fetched 2026-05-06 06:39:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

steezkelly

Participants

steezkelly

Timeline (top)

cross-referenced ×4labeled ×3closed ×1

Error Message

Kanban is a multi-profile orchestration primitive. A stale or half-created profile currently creates an opaque failure mode: the board says a task is assigned and dispatchable, but the worker cannot actually start correctly. Failing fast with a precise profile-readiness error would save debugging time and prevent dispatcher churn.

Root Cause

Fix Action

Fix / Workaround

Kanban task assignment treats assignee as a Hermes profile slug. If the profile directory exists but is incomplete or not runnable, the dispatcher can still claim the task and spawn hermes -p <profile> chat -q .... The failure then surfaces later as confusing worker/provider/auth behavior instead of a clear Kanban/profile-readiness diagnostic.

A Kanban task assigned to debugger-v1 was still eligible for dispatch. The resulting worker startup path failed indirectly through provider/auth fallback behavior rather than reporting something like:

Current dispatch path appears to normalize the assignee but not validate profile readiness before claiming/spawning:

PR fix notes

PR #20065: fix(kanban): validate worker profile before spawn

Repository: NousResearch/hermes-agent
Author: steezkelly
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20065

Description (problem / solution / changelog)

Summary

Add a cheap pre-spawn readiness guard for Kanban worker profiles
Fail fast when a non-default assignee profile does not exist or lacks config.yaml
Prevent _default_spawn from launching hermes -p <profile> for half-created profile directories
Update spawn-env tests to model runnable profiles explicitly

Fixes #20054

Scope

This intentionally checks deterministic local readiness only:

profile exists
profile has config.yaml

It does not attempt provider-specific credential validation; bad/expired credentials can still fail during worker startup and are reported through the existing spawn-failure path.

Test Plan

venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py::TestWorkerSpawnEnv::test_default_spawn_rejects_half_created_profile -q -o 'addopts=' — watched fail before implementation, then pass
venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py::TestWorkerSpawnEnv -q -o 'addopts=' → 3 passed
venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py tests/hermes_cli/test_kanban_db.py tests/hermes_cli/test_kanban_cli.py -q -o 'addopts=' → 124 passed
venv/bin/python -m pytest tests/tools/test_kanban_tools.py tests/plugins/test_kanban_dashboard_plugin.py -q -o 'addopts=' → 89 passed, 2 unrelated deprecation warnings
venv/bin/python -m py_compile hermes_cli/kanban_db.py tests/hermes_cli/test_kanban_boards.py tests/hermes_cli/test_kanban_db.py → passed

Changed files

hermes_cli/kanban_db.py (modified, +30/-0)
tests/hermes_cli/test_kanban_boards.py (modified, +39/-0)
tests/hermes_cli/test_kanban_db.py (modified, +6/-0)

PR #20067: fix(cli): validate assignee profile readiness before kanban dispatch

Repository: NousResearch/hermes-agent
Author: konsisumer
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20067

Description (problem / solution / changelog)

Validate that the assignee profile is runnable (directory exists and contains config.yaml) before the Kanban dispatcher claims a task and spawns hermes -p <profile> chat -q .... Half-created profile directories now fail fast with a precise diagnostic instead of cascading into opaque provider/auth errors.

What changed and why

Add profile_readiness_error(name) in hermes_cli/profiles.py that returns None for runnable profiles or a diagnostic string identifying what's missing (invalid name / dir missing / config.yaml missing). The default profile is always runnable since it IS HERMES_HOME.
In hermes_cli/kanban_db.py::dispatch_once, validate the assignee profile before claiming for the default-spawn path. Unrunnable profiles are claimed and auto-blocked immediately via the existing _record_spawn_failure circuit breaker with failure_limit=1 — readiness errors don't fix themselves between ticks, so retrying N times before blocking is wasted churn.
The gate is skipped when a custom spawn_fn is passed (tests, simulators, alternate worker hosts) so existing assignee semantics are preserved.
Updated test_workspace_resolution_failure_also_counts to seed a runnable worker profile so the new readiness gate doesn't pre-empt the workspace-resolution test path.

How to test

pytest tests/hermes_cli/test_profiles.py::TestProfileReadinessError -q
pytest tests/hermes_cli/test_kanban_db.py -q -k dispatch
pytest tests/hermes_cli/test_kanban_core_functionality.py -q
New test_dispatch_blocks_unrunnable_assignee_profile reproduces the issue scenario (profile dir with SOUL.md but no config.yaml) and asserts the task ends in blocked with a profile_not_runnable: ... reason in last_spawn_error.
New test_dispatch_allows_runnable_assignee_profile confirms a profile with config.yaml passes the gate and reaches the spawn path.

What platforms tested on

macOS on darwin-arm64 (local) — full tests/hermes_cli/test_kanban*.py and tests/hermes_cli/test_profiles.py pass (347 tests).

Fixes #20054

Changed files

hermes_cli/kanban_db.py (modified, +25/-0)
hermes_cli/profiles.py (modified, +34/-0)
tests/hermes_cli/test_kanban_core_functionality.py (modified, +7/-0)
tests/hermes_cli/test_kanban_db.py (modified, +58/-0)
tests/hermes_cli/test_profiles.py (modified, +46/-0)

PR #20105: fix(kanban): dispatcher skips ready tasks whose assignee is not a real profile

Repository: NousResearch/hermes-agent
Author: Brecht-H
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20105

Description (problem / solution / changelog)

Summary

The kanban dispatcher's _default_spawn invokes hermes -p <task.assignee> chat -q .... When assignee names a control-plane lane (e.g. an interactive Claude Code terminal like orion-cc / orion-research) instead of a real Hermes profile, the subprocess fails on startup with Profile 'X' does not exist, gets reaped as a zombie, the TTL/crash detector reclaims the task back to ready, and the next tick re-spawns the same crashing worker.

Result: a permanent crash loop emitting spawned=N reclaimed=0 crashed=N in the gateway log every minute, two zombie processes per affected task, and CPU burn until someone notices.

Reproduce

# 1. Create a kanban task whose assignee names a non-profile.
hermes kanban create --assignee orion-cc --status ready \
    --title "Review PR #N" --body "..."
# 2. Start the gateway with the embedded dispatcher.
hermes gateway run

# gateway.log emits every minute:
#   kanban dispatcher: tick spawned=1 reclaimed=0 crashed=1 ...
# Per-task log /home/<u>/.hermes/<profile>/kanban/logs/<task_id>.log:
#   Error: Profile 'orion-cc' does not exist. Create it with:
#       hermes profile create orion-cc
# ps -ef | grep '[h]ermes.*defunct' — zombies pile up until reaped.

Fix

dispatch_once() now pre-checks hermes_cli.profiles.profile_exists(assignee) before claiming. If the profile does NOT exist, the row is appended to skipped_unassigned (semantically: it's unassigned to an executable profile) and the dispatcher moves on without claiming, spawning, or counting a crash.

The import is locally scoped + try/except wrapped, so if profile_exists is missing or fails to import (test isolation, future module restructure) the original behaviour is preserved unchanged.

Why profile-existence over a config flag

The kanban task body (t_2bab06e3 on Brecht-H's local kanban) hinted at gating behind a config flag like assignee=hermes|auto. Profile-existence is a strictly tighter check:

Self-documenting — the operator already knows whether they have an orion-cc profile; no allowlist to maintain.
Forward-compatible — the moment a new lane gets a real hermes profile create <name>, it auto-qualifies for spawn.
No new config surface — zero new keys in config.yaml.

Operators who want the "config flag" semantics can still opt in via creating an empty placeholder profile.

Validated live (Orion machine)

Two orion-research-assigned tasks (t_a14dc1d5 Bug-C investigation, t_646c96f2 provider-routing validation) had been crash-looping since 2026-05-05 06:58 UTC after Mac switched the lane workflow to kanban-pull-by-terminal. Pre-patch:

2026-05-05 07:30:05 INFO gateway.run: kanban dispatcher: tick spawned=2 reclaimed=0 crashed=2 timed_out=0 promoted=0 auto_blocked=0
2026-05-05 07:31:05 INFO gateway.run: kanban dispatcher: tick spawned=2 reclaimed=0 crashed=2 timed_out=0 promoted=0 auto_blocked=0
... (every minute, 2 hours+)

Post-patch (gateway restart at 07:41:39):

2026-05-05 07:41:39 INFO gateway.run: kanban dispatcher: embedded in gateway (interval=60.0s)
( silent — spawn_any=False on every tick, log line guarded behind `if res.spawned` )

Live state:

Stale running claims auto-reclaimed to ready on the first post-patch tick.
Tasks now sit at status=ready, claim_lock=None, worker_pid=None, spawn_failures=0 — clean, ready for terminal pull.
Dashboard / telegram / freqtrade / committee_listener all unaffected (only the dispatcher path changed).

Test plan

Live verification on Orion: 2-hour crash loop terminated, dispatcher silent, no defuncts pile up
Tasks reclaim cleanly to ready post-restart
Existing well-behaved tasks (assignee=daily) still spawn (counterfactual: profile_exists("daily") = True confirmed via Python REPL)
Defensive import — if hermes_cli.profiles ever moves, fall-through to original behaviour

🤖 Generated with Claude Code

Changed files

hermes_cli/kanban_db.py (modified, +17/-0)

PR #20165: fix(kanban): skip dispatch for tasks assigned to non-profile lanes (salvages #20105, #20134)

Repository: NousResearch/hermes-agent
Author: teknium1
State: closed | merged: True
Link: https://github.com/NousResearch/hermes-agent/pull/20165

Description (problem / solution / changelog)

Kanban dispatcher no longer crash-loops on tasks assigned to names that aren't real Hermes profiles, and the stuck-queue warning only fires when there's genuine spawnable work sitting idle.

Root cause: dispatch_once() claimed any ready+assigned task and shelled out hermes -p <assignee> chat -q .... When <assignee> named a control-plane terminal lane (e.g. orion-cc) rather than a profile on disk, the subprocess died with "Profile 'X' does not exist", was reaped as a zombie, the TTL detector released the claim back to ready, and the next tick re-spawned the same failing worker — forever.

Salvaged from #20105 + #20134 (@Brecht-H).

Changes

hermes_cli/kanban_db.py: dispatch_once() pre-checks profile_exists(assignee) before claiming; non-matches route into a new DispatchResult.skipped_nonspawnable bucket (separate from skipped_unassigned).
hermes_cli/kanban_db.py: new has_spawnable_ready(conn) helper returns True only if ≥1 ready+assigned+unclaimed task has an assignee that resolves to a real profile.
gateway/run.py + hermes_cli/kanban.py: both dispatchers swap their ready_nonempty probe to has_spawnable_ready, so "dispatcher stuck" WARN no longer fires on multi-lane hosts where the queue is healthy but none of the ready tasks target a spawnable profile.
tests/hermes_cli/conftest.py: new all_assignees_spawnable fixture monkeypatches profile_exists → True for tests that use synthetic assignees. Threaded through 8 dispatcher tests that the profile-exists guard would otherwise have silently broken.

Defensive import: both profile_exists lookups fall back to legacy "any ready+assigned" behavior if hermes_cli.profiles is unimportable, so degraded installs still surface the original warn.

Validation

	Before	After
Task assigned to `orion-cc` (not a profile)	permanent crash loop, 2 zombies/tick, `spawned=1 crashed=1` every minute	silent skip, `skipped_nonspawnable=1`, no claim, no zombie
Multi-lane queue full of terminal-lane assignees	`dispatcher stuck` WARN every 5 min	silent — `has_spawnable_ready=False`
Real profile missing PATH/venv/creds	`dispatcher stuck` WARN still fires after 6 ticks	unchanged (safety net intact)
Targeted tests	—	246/246 pass (`test_kanban_{db,cli,boards,core_functionality}`)

Live-verified by @Brecht-H on his Orion multi-lane host: 2-hour crash loop on t_a14dc1d5 + t_646c96f2 terminated on gateway restart; dispatcher silent on every subsequent tick; stale running claims reclaimed cleanly to ready.

Closes #20054 Closes #20105 Closes #20134 Supersedes #20065 (readiness check lives at a tighter call site — before claim, not before spawn)

Co-authored-by: Brecht-H [email protected]

Changed files

gateway/run.py (modified, +12/-7)
hermes_cli/kanban.py (modified, +16/-8)
hermes_cli/kanban_db.py (modified, +64/-0)
tests/hermes_cli/conftest.py (added, +19/-0)
tests/hermes_cli/test_kanban_core_functionality.py (modified, +6/-6)
tests/hermes_cli/test_kanban_db.py (modified, +51/-3)

Code Example

~/.hermes/profiles/debugger-v1/

---

assignee profile debugger-v1 is not runnable: missing config.yaml / credentials

---

cmd = ["hermes", "-p", profile_arg, "--skills", "kanban-worker", "chat", "-q", prompt]

---

kanban dispatcher stuck: ready queue non-empty ... Check profile health (venv, PATH, credentials)

---

Profile debugger-v1 is not runnable: missing ~/.hermes/profiles/debugger-v1/config.yaml

RAW_BUFFERClick to expand / collapse

Bug Description

Observed Behavior

While recovering a local multi-agent Kanban setup, I found a profile directory:

~/.hermes/profiles/debugger-v1/

The directory existed and contained profile material such as skills / SOUL.md, but was missing the runnable profile prerequisites:

config.yaml
.env
auth.json

assignee profile debugger-v1 is not runnable: missing config.yaml / credentials

After creating a proper profile config and syncing credential state, the same assignee worked and completed a smoke-test Kanban task.

Code Path / Evidence

Current dispatch path appears to normalize the assignee but not validate profile readiness before claiming/spawning:

hermes_cli/kanban_db.py::_canonical_assignee() only calls normalize_profile_name(assignee).
hermes_cli/kanban_db.py::dispatch_once() claims ready assigned tasks before spawn.
hermes_cli/kanban_db.py::_default_spawn() builds:

cmd = ["hermes", "-p", profile_arg, "--skills", "kanban-worker", "chat", "-q", prompt]

but does not appear to check that profile_arg is a runnable profile before spawning.

There is profile-name validation support in hermes_cli/profiles.py, including:

normalize_profile_name()
validate_profile_name()
profile_exists()

However, profile_exists() only checks that the profile directory exists. A half-created profile directory can therefore pass the rough existence condition while still being unrunnable.

The gateway dispatcher has aggregate stuck-queue telemetry:

kanban dispatcher stuck: ready queue non-empty ... Check profile health (venv, PATH, credentials)

but the per-task failure could be diagnosed earlier and more precisely.

Expected Behavior

Before claiming/spawning a Kanban worker, the dispatcher should validate that the assignee profile is runnable. For example:

profile name is valid and normalized
profile exists, unless default
profile has a readable config.yaml, or documented/default inheritance applies
provider/model can be resolved from the profile config
required credential state is present or explicitly inherited
optional: a cheap non-interactive profile startup/config check passes

If validation fails, Kanban should not spawn the worker. It should either:

leave the task unclaimed and comment with the readiness failure, or
auto-block the task with a clear reason, e.g.:

Profile debugger-v1 is not runnable: missing ~/.hermes/profiles/debugger-v1/config.yaml

Why This Matters

Related Issues / Distinction

This is related to Kanban multi-profile robustness, but distinct from:

#18442 — Kanban DB profile-scoping / shared board visibility
#18498 — case sensitivity in assignee/profile validation

This report is specifically about validating whether an assignee profile is runnable before worker spawn.

Environment

Hermes Agent v0.12.0 (2026.4.30)
Source commit: b816fd4e2
OS: Linux desktop 6.17.0-22-generic x86_64
Python: 3.11.15

extent analysis

TL;DR

Validate the assignee profile's readiness before claiming and spawning a Kanban worker to prevent opaque failure modes.

Guidance

Modify the dispatch_once() function in hermes_cli/kanban_db.py to call a new validate_profile_readiness() function that checks for the existence of required files like config.yaml, .env, and auth.json in the profile directory.
Use existing functions like normalize_profile_name() and profile_exists() from hermes_cli/profiles.py as a starting point for the new validation function.
Consider adding a cheap non-interactive profile startup/config check to the validation function to ensure the profile is runnable.
If validation fails, leave the task unclaimed and comment with the readiness failure or auto-block the task with a clear reason.

Example

def validate_profile_readiness(profile_name):
    profile_dir = os.path.join("~/.hermes/profiles", profile_name)
    required_files = ["config.yaml", ".env", "auth.json"]
    for file in required_files:
        if not os.path.exists(os.path.join(profile_dir, file)):
            return False
    return True

Notes

The proposed solution assumes that the required files are necessary for a profile to be considered runnable. Additional validation may be necessary depending on the specific requirements of the Hermes Agent and Kanban setup.

Recommendation

Apply a workaround by modifying the dispatch_once() function to validate the assignee profile's readiness before claiming and spawning a Kanban worker. This will prevent opaque failure modes and provide more informative error messages.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tensor shape #autograd error #model save/load #optimization #mixed precision

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Kanban dispatcher should validate assignee profile readiness before spawning workers [4 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #20065: fix(kanban): validate worker profile before spawn

Description (problem / solution / changelog)

Summary

Scope

Test Plan

Changed files

PR #20067: fix(cli): validate assignee profile readiness before kanban dispatch

Description (problem / solution / changelog)

What changed and why

How to test

What platforms tested on

Changed files

PR #20105: fix(kanban): dispatcher skips ready tasks whose assignee is not a real profile

Description (problem / solution / changelog)

Summary

Reproduce

Fix

Why profile-existence over a config flag

Validated live (Orion machine)

Test plan

Changed files

PR #20165: fix(kanban): skip dispatch for tasks assigned to non-profile lanes (salvages #20105, #20134)

Description (problem / solution / changelog)

Changes

Validation

Changed files

Code Example

Bug Description

Observed Behavior

Code Path / Evidence

Expected Behavior

Why This Matters

Related Issues / Distinction

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING