hermes - 💡(How to fix) Fix Workflow plugin: fire-and-forget asyncio task causes node execution to stall when triggered from agent tool

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

The switch-smoke-test workflow gets stuck in running state forever with zero node_completed events when triggered via the workflow_run agent tool from a conversation. No error is produced — the run simply never progresses past node_started.

Root Cause

engine.start_run() uses asyncio.create_task(_execute(...)) (fire-and-forget). The agent's conversation loop is synchronous — once the tool handler returns the run dict, the event loop stops pumping. The background task is orphaned and never gets CPU time to complete the bash subprocess execution.

Execution chain breakdown:

workflow_run tool → engine.start_run()
  → asyncio.create_task(_execute(...))   ← fire-and-forget
  → returns run dict to agent            ← agent gets response
  → Agent event loop stops pumping       ← BACKGROUND TASK ORPHANED
    → _execute() never gets CPU time
      → create_subprocess_exec().communicate() never completes

Code Example

workflow_run tool → engine.start_run()
  → asyncio.create_task(_execute(...))   ← fire-and-forget
  → returns run dict to agent            ← agent gets response
Agent event loop stops pumping       ← BACKGROUND TASK ORPHANED
_execute() never gets CPU time
create_subprocess_exec().communicate() never completes
RAW_BUFFERClick to expand / collapse

Bug Report

Symptom

The switch-smoke-test workflow gets stuck in running state forever with zero node_completed events when triggered via the workflow_run agent tool from a conversation. No error is produced — the run simply never progresses past node_started.

Root Cause

engine.start_run() uses asyncio.create_task(_execute(...)) (fire-and-forget). The agent's conversation loop is synchronous — once the tool handler returns the run dict, the event loop stops pumping. The background task is orphaned and never gets CPU time to complete the bash subprocess execution.

Execution chain breakdown:

workflow_run tool → engine.start_run()
  → asyncio.create_task(_execute(...))   ← fire-and-forget
  → returns run dict to agent            ← agent gets response
  → Agent event loop stops pumping       ← BACKGROUND TASK ORPHANED
    → _execute() never gets CPU time
      → create_subprocess_exec().communicate() never completes

Evidence

  • Dashboard-triggered runs (via plugin_api.py): Complete in ~20ms — the long-lived FastAPI/Uvicorn event loop keeps the async task alive
  • Agent-triggered runs (via workflow_run tool): Produce node_started events but never node_completed — the synchronous conversation loop orphans the task
  • Gateway health check completed successfully because it was triggered from the dashboard, not the agent tool

Affected Files

  • plugins/workflow-engine/runner.py — _execute() background task
  • plugins/workflow-engine/engine.py — start_run() fire-and-forget pattern
  • Tool handler (wherever workflow_run is registered) — needs to bridge sync/async correctly

Suggested Fixes

Option A — Block until completion (simple):

  • await the _execute() task instead of fire-and-forget
  • Return the final result, not just the start acknowledgement

Option B — Dedicated background worker thread:

  • Submit execution to a worker thread/queue with its own event loop
  • Independent of the agent conversation cycle

Option C — Hybrid (recommended):

  • Short workflows: await inline
  • Long workflows: delegate to daemon process, return run_id for polling

Reproduction

  1. Start a Hermes Agent session
  2. Call workflow_run with id=switch-smoke-test
  3. Observe: run status shows running, zero events, never completes
  4. Compare: trigger same workflow from dashboard API — completes in ~20ms

Environment

  • Plugin: workflow-engine
  • OS: macOS

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Workflow plugin: fire-and-forget asyncio task causes node execution to stall when triggered from agent tool