hermes - 💡(How to fix) Fix Public termination endpoint for kanban runs (POST /runs/{run_id}/terminate)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

Option A — Open endpoint POST /runs/{run_id}/terminate sends SIGTERM (→ SIGKILL after grace) immediately. Any authenticated dashboard caller can terminate any run. Simple; matches the "no RBAC layer" reality of all other dashboard routes today. Downside: no audit trail, no signal to the dispatcher that the task was deliberately cancelled vs. crashed.

Option B — Soft-cancel flag (proposed default) POST /runs/{run_id}/terminate returns 202 immediately and sets a runs.cancel_requested = 1 flag. The dispatcher's next tick reads the flag, sends SIGTERM, waits for grace period, SIGKILLs if needed, and closes the run with outcome=cancelled. ?force=true skips the flag and sends SIGKILL directly. Advantages: dispatcher-mediated semantics match how reclaim/claim work elsewhere; ?force documents destructive intent explicitly; the flag survives a dashboard restart.

Option B. Soft-cancel + ?force escape hatch is the right trade-off: it preserves dispatcher-mediated semantics (everything goes through the loop), gives the worker a clean shutdown path, and the ?force flag makes SIGKILL an explicit opt-in rather than the default. Option C can layer on top later if multi-user RBAC becomes a requirement.

RAW_BUFFERClick to expand / collapse

Public termination endpoint for kanban runs (POST /runs/{run_id}/terminate)

Motivation

_terminate_reclaimed_worker in hermes_cli/kanban_db.py (~line 3057) already implements SIGTERM → grace period → SIGKILL, but no HTTP route exposes it. The only user-facing termination path today is POST /tasks/{task_id}/reclaim, which is a recovery action for stuck/dead workers — not a clean "stop this running task" API. Operators who need to cancel a live, well-behaved worker have no dashboard or API surface to do so without SSHing into the host.

Adjacent evidence: issue #22176 (CLI interrupt /stop not working) shows user demand for a stop primitive; a public terminate endpoint would satisfy the same need for tasks already claimed and running.

Design options

Option A — Open endpoint POST /runs/{run_id}/terminate sends SIGTERM (→ SIGKILL after grace) immediately. Any authenticated dashboard caller can terminate any run. Simple; matches the "no RBAC layer" reality of all other dashboard routes today. Downside: no audit trail, no signal to the dispatcher that the task was deliberately cancelled vs. crashed.

Option B — Soft-cancel flag (proposed default) POST /runs/{run_id}/terminate returns 202 immediately and sets a runs.cancel_requested = 1 flag. The dispatcher's next tick reads the flag, sends SIGTERM, waits for grace period, SIGKILLs if needed, and closes the run with outcome=cancelled. ?force=true skips the flag and sends SIGKILL directly. Advantages: dispatcher-mediated semantics match how reclaim/claim work elsewhere; ?force documents destructive intent explicitly; the flag survives a dashboard restart.

Option C — Scoped admin token Destructive ops (terminate, kill) require a separate HERMES_ADMIN_TOKEN env var distinct from the dashboard read token. Safer for shared deployments; adds operational overhead for solo installs.

Proposed default

Option B. Soft-cancel + ?force escape hatch is the right trade-off: it preserves dispatcher-mediated semantics (everything goes through the loop), gives the worker a clean shutdown path, and the ?force flag makes SIGKILL an explicit opt-in rather than the default. Option C can layer on top later if multi-user RBAC becomes a requirement.

Next steps

Will follow up with a PR implementing Option B after design preference is confirmed in this thread. Read-only sibling endpoints (GET /workers/active, GET /runs/{run_id}, GET /runs/{run_id}/inspect) land in the companion PR (link to be added once opened).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING