hermes - 💡(How to fix) Fix [Feature]: Use -z (oneshot) mode for kanban worker spawn instead of chat -q [1 participants]

Fix Action

Fix / Workaround

Kanban workers dispatched via _default_spawn() in kanban_db.py use chat -q mode under the hood — a subprocess spawned with stdin=subprocess.DEVNULL, stdout/stderr to a log file, no TTY available.

PTY allocation in the subprocess. The kanban dispatcher could allocate a PTY for each worker. This would solve the TTY issue but adds significant complexity (PTY lifecycle management, signal forwarding) and doesn't address the approval-hang root cause.

Subprocess cleanup: The kanban dispatcher detects crashes via the returned PID (reaped with WNOHANG on the next tick) and the claim TTL. Neither chat -q's extra signal handlers nor -z's absence of them changes the cleanup path.

Code Example

flowchart TD
    A[argparse: cmd_chat]
    A --> B[cli.py:main]
    B --> C[HermesCLI.__init__ — SessionDB]
    C --> D[_ensure_runtime_credentials]
    D --> E[_init_agent → AIAgent]
    E --> F[agent.run_conversation]
    F --> G[print response to stdout]
    F --> H[session_id to stderr]
    F --> I[sys.exit]

---

flowchart TD
    A[argparse: run_oneshot]
    A --> B[logging.disable CRITICAL]
    B --> C[HERMES_YOLO_MODE=1 + HERMES_ACCEPT_HOOKS=1]
    C --> D[redirect stdout + stderr to /dev/null]
    D --> E[_run_agent]
    E --> F[load_config + resolve runtime]
    F --> G[_create_session_db_for_oneshot]
    G --> H[AIAgent with _oneshot_clarify_callback]
    H --> I[agent.chat]
    I --> J[real_stdout: final response only]
    I --> K[return 0]

Problem or Use Case

This works in most cases, but has a latent risk: chat -q keeps interactive approval callbacks active (dangerous-command approval, shell-hook first-use, sudo password, clarify prompts). When one of these fires in a headless subprocess with DEVNULL stdin, the worker hangs indefinitely — it's waiting for input that will never arrive. The task eventually gets auto-blocked after failure_limit consecutive timeout deaths.

PR #23851 proposed swapping to -z (oneshot) mode, which auto-bypasses approvals via HERMES_YOLO_MODE=1 and HERMES_ACCEPT_HOOKS=1, replacing interactive callbacks with synthetic "pick a default" responders.

That PR was closed with the following reasoning from @teknium1:

The change from chat -q to -z is tiny but behavior-changing in a way that needs a maintainer decision: -z and -q have different lifecycle semantics for worker session handling. Main still uses chat -q deliberately. If you want to revive this, please open an issue first laying out which subprocess-cleanup or output-buffering behavior -z would buy us that -q doesn't.

This issue provides that analysis.

Proposed Solution

Replace chat -q with -z (oneshot) in _default_spawn(), based on evidence that:

1. Both modes have equivalent session lifecycle. Both create a SessionDB, wire it into AIAgent.__init__, and persist the conversation. The -z path's _create_session_db_for_oneshot() is lightweight but functionally identical for the kanban use case.

2. Both modes work without a TTY. Neither crashes with hermes-tui: no TTY — that check only fires when --tui is explicitly used. The chat -q path never enters the Ink TUI; it uses the prompt_toolkit REPL only in interactive run() mode, not single-query mode.

3. -z is strictly better in headless contexts:

Auto-bypasses approvals (no hang risk)
2-3x faster startup (skips HermesCLI banner, tool enumeration, exit summary)
63% smaller log output (silences stdlib loggers, no session footer)
Session persistence is identical

Alternatives Considered

Keep chat -q but add HERMES_YOLO_MODE=1 to the spawn env. This is a lighter-touch fix. Discussed in the PR thread. It addresses the hang risk but doesn't capture the startup speed or log clarity benefits. Either approach is acceptable — the key is closing the hang vector.

Detect non-TTY stdin in chat -q and auto-enable HERMES_YOLO_MODE. This would make chat -q self-scoping, but it's a behavioral change to a widely-used flag that could surprise existing users.

Testing Methodology

Reproducible Docker test. Container: python:3.11-slim + Hermes Agent v0.14.0. Mock server: OpenAI-compatible HTTP echo on 127.0.0.1:9999. Both modes were spawned as subprocesses with stdin=subprocess.DEVNULL, stdout/stderr to log file — identical to _default_spawn().

Evidence

chat -q code path (current _default_spawn):

flowchart TD
    A[argparse: cmd_chat]
    A --> B[cli.py:main]
    B --> C[HermesCLI.__init__ — SessionDB]
    C --> D[_ensure_runtime_credentials]
    D --> E[_init_agent → AIAgent]
    E --> F[agent.run_conversation]
    F --> G[print response to stdout]
    F --> H[session_id to stderr]
    F --> I[sys.exit]

-z / oneshot code path (proposed):

flowchart TD
    A[argparse: run_oneshot]
    A --> B[logging.disable CRITICAL]
    B --> C[HERMES_YOLO_MODE=1 + HERMES_ACCEPT_HOOKS=1]
    C --> D[redirect stdout + stderr to /dev/null]
    D --> E[_run_agent]
    E --> F[load_config + resolve runtime]
    F --> G[_create_session_db_for_oneshot]
    G --> H[AIAgent with _oneshot_clarify_callback]
    H --> I[agent.chat]
    I --> J[real_stdout: final response only]
    I --> K[return 0]

Key differences between the two paths:

Aspect	chat -q	-z
Approval handling	Interactive callbacks active — can hang on DEVNULL stdin	`YOLO_MODE=1`, `ACCEPT_HOOKS=1`, synthetic clarify — no hang
Startup overhead	Full HermesCLI init (banner, tool enumeration, session DB)	Minimal — logging silenced, no banner, no TUI init
Output	Response + banner + session footer to stdout/stderr	Only final response to stdout, everything else to /dev/null
Stderr	Session ID, warnings, progress lines	Silenced entirely
Signal handlers	SIGTERM/SIGHUP → agent.interrupt()	None (kanban uses PID detection + claim TTL)
Provider failure	Graceful — catches exception, prints message, exits 0	RuntimeError crash, exits 1

Docker test results

Metric	chat -q	-z (oneshot)
Exit code	0	0
No TTY crash	✓	✓
Session persisted	✓	✓
Elapsed (basic)	5.61s	2.03s
Log size (kanban)	1408b	520b
Approval hang risk	⚠ Yes	None
Provider error handling	Graceful	RuntimeError
Traceback on failure	No	Yes

Addressing the lifecycle concern

The concern was: do -z and -q have different lifecycle semantics for worker session handling?

Session lifecycle is identical. Both paths call the same AgentDB / SessionDB constructor and pass the session to AIAgent.__init__. The -z path's session initialization is in _create_session_db_for_oneshot() (oneshot.py:202-215) and chat -q's is in HermesCLI.__init__ (cli.py:2780-2792). Both use hermes_state.SessionDB().

Output buffering differs between modes, but in the kanban context all stdout/stderr goes to a log file (stdout=log_f, stderr=subprocess.STDOUT). The -z mode silences stderr and loggers, producing cleaner logs. This is a benefit, not a regression.

The real lifecycle difference is approval handling, and it's a reason to switch: -z is safe for headless subprocesses, chat -q is not.

Feature Type

CLI improvement
Performance / reliability

Scope

Small (single file, ~2 lines changed in kanban_db.py + test assertion updates)

Contribution

I'd like to implement this myself and submit a PR.

Filed by Jasper (AI agent on behalf of Magnus Hedemark). Analysis — including Docker reproduction methodology, code path tracing, and side-by-side testing — was performed in collaboration with Magnus. Reproducible test suite: https://gist.github.com/magnus919/e3972d3cd2bcb2eed460a83c0ec3f9dc

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Use -z (oneshot) mode for kanban worker spawn instead of chat -q [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem or Use Case

Proposed Solution

Alternatives Considered

Testing Methodology

Evidence

Docker test results

Addressing the lifecycle concern

Feature Type

Scope

Contribution

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Use -z (oneshot) mode for kanban worker spawn instead of chat -q [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem or Use Case

Proposed Solution

Alternatives Considered

Testing Methodology

Evidence

Docker test results

Addressing the lifecycle concern

Feature Type

Scope

Contribution

Still need to ship something?

RELATED_DISCOVERY

TRENDING