openclaw - ✅(Solved) Fix [Bug] exec-approval-followup tasks stuck in 'running' forever, blocking channel reload and saturating event loop [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76162Fetched 2026-05-03 04:41:36
View on GitHub
Comments
1
Participants
2
Timeline
11
Reactions
2
Timeline (top)
referenced ×5cross-referenced ×4commented ×1unsubscribed ×1

When exec approval follow-up tasks fail (e.g. network error during LLM call), the task runs remain in running status indefinitely. They are not reconciled by the task maintenance sweeper because hasBackingSession always returns true for cli runtime tasks whose childSessionKey points to a persistent session (e.g. agent:main:main).

This is the same bug pattern as #75307 (cron tasks stuck in running), but the fix for that issue only covered the cron code path. The cli runtime path in hasBackingSession still has the defect.

Error Message

When exec approval follow-up tasks fail (e.g. network error during LLM call), the task runs remain in running status indefinitely. They are not reconciled by the task maintenance sweeper because hasBackingSession always returns true for cli runtime tasks whose childSessionKey points to a persistent session (e.g. agent:main:main). 2. Let the follow-up embedded runs fail (network error, LLM timeout, etc.)

Root Cause

In src/tasks/task-registry.maintenance.ts, the hasBackingSession function:

function hasBackingSession(task) {
  if (task.runtime === cron) { ... } // Fixed in #75307
  if (task.runtime === cli && hasActiveCliRun(task)) return true;
  const childSessionKey = task.childSessionKey?.trim();
  if (!childSessionKey) return true;
  // ...
  if (task.runtime === subagent || task.runtime === cli) {
    // For cli: checks findTaskSessionEntry which returns true
    // if the session key exists in the store.
    // agent:main:main is a persistent session — it always exists.
    return Boolean(entry);
  }
}

For exec-approval-followup tasks:

  • runtime = cli
  • childSessionKey = agent:main:main (the main session)
  • hasActiveCliRun returns false (the embedded run has ended)
  • findTaskSessionEntry returns true (main session always exists in store)
  • hasBackingSessiontrue
  • shouldMarkLostfalse
  • Task is never reconciled

Fix Action

Fixed

PR fix notes

PR #76199: fix(tasks): mark exec-approval-followup cli tasks lost when run ends (#76162)

Description (problem / solution / changelog)

Problem

exec-approval-followup tasks (and any cli task that uses childSessionKey="agent:main:main") get stuck in running forever after the embedded agent run ends.

Root cause: hasBackingSession() in task-registry.maintenance.ts had:

if (task.runtime === "cli" && hasActiveCliRun(task)) {
  return true;
}
// falls through to session-existence check when hasActiveCliRun is false

When the run ends, hasActiveCliRun returns false, so it falls through to the session-existence check. exec-approval-followup tasks use childSessionKey="agent:main:main" — the persistent main session, which always exists. So hasBackingSession returns true indefinitely and the task is never marked lost.

Fix

Return hasActiveCliRun(task) immediately for all cli tasks — same pattern already used for cron tasks:

if (task.runtime === "cli") {
  // CLI task liveness is determined solely by whether the embedded agent run
  // is still active. Falling through to session-existence checks is wrong:
  // exec-approval-followup tasks use childSessionKey="agent:main:main" which
  // is a persistent session — it always exists, so the session-existence path
  // would never mark the task lost (#76162). Same pattern as cron above.
  return hasActiveCliRun(task);
}

Also removes the now-dead resolveSessionChatType() helper and sessionChatTypesByKey from the lookup context.

Tests

  • Updated existing test to remove dead caching assertion (the removed resolveSessionChatType was the only caller)
  • Added regression test: exec-approval-followup cli task with childSessionKey="agent:main:main" and a live session store entry gets marked lost when run ends
  • 302/302 tests pass in src/tasks/

Fixes #76162

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/automation/tasks.md (modified, +5/-6)
  • src/commands/tasks.test.ts (modified, +1/-1)
  • src/tasks/task-registry.maintenance.issue-60299.test.ts (modified, +25/-4)
  • src/tasks/task-registry.maintenance.ts (modified, +10/-34)

PR #76216: fix(status): guard resolveSessionModelRef against non-string model fields (#76206)

Description (problem / solution / changelog)

Problem

openclaw status crashes with TypeError: runtimeModel?.trim is not a function when any session entry in ~/.openclaw/agents/<agent>/sessions/sessions.json has a non-string value for model, modelProvider, providerOverride, or modelOverride.

Root cause: readSessionStoreReadOnly parses session JSON with z.record(z.string(), z.unknown()) — no field normalization. The four model fields reach resolvePersistedSelectedModelRef typed as string | undefined but holding arbitrary JSON values. The internal .trim() calls crash on objects or numbers.

The loadSessionStore path does normalize via normalizeSessionRuntimeModelFields, but openclaw status uses readSessionStoreReadOnly for its read-only scan.

Fix

Wrap the four fields with normalizeOptionalString() at the resolveSessionModelRef call site in status.summary.runtime.ts. normalizeOptionalString accepts unknown and returns string | undefined, so non-string values are discarded and the resolver falls back to the configured default.

Tests

Two regression tests added to status.summary.runtime.test.ts:

  • object model field → does not throw, falls back to configured default
  • object modelOverride + number providerOverride → does not throw

Fixes #76206

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/commands/status.summary.runtime.test.ts (modified, +17/-0)
  • src/commands/status.summary.runtime.ts (modified, +4/-4)

Code Example

function hasBackingSession(task) {
  if (task.runtime === cron) { ... } // Fixed in #75307
  if (task.runtime === cli && hasActiveCliRun(task)) return true;
  const childSessionKey = task.childSessionKey?.trim();
  if (!childSessionKey) return true;
  // ...
  if (task.runtime === subagent || task.runtime === cli) {
    // For cli: checks findTaskSessionEntry which returns true
    // if the session key exists in the store.
    // agent:main:main is a persistent session — it always exists.
    return Boolean(entry);
  }
}
RAW_BUFFERClick to expand / collapse

Summary

When exec approval follow-up tasks fail (e.g. network error during LLM call), the task runs remain in running status indefinitely. They are not reconciled by the task maintenance sweeper because hasBackingSession always returns true for cli runtime tasks whose childSessionKey points to a persistent session (e.g. agent:main:main).

This is the same bug pattern as #75307 (cron tasks stuck in running), but the fix for that issue only covered the cron code path. The cli runtime path in hasBackingSession still has the defect.

Repro

  1. Trigger exec approvals that generate exec-approval-followup tasks (e.g. batch approve exec commands)
  2. Let the follow-up embedded runs fail (network error, LLM timeout, etc.)
  3. After the runs fail, check tasks/runs.json — tasks remain in running status
  4. Restart the gateway — tasks are reloaded from disk, still running
  5. Channel reload is permanently deferred: channel reload still deferred after Xms with 60 task run(s) active
  6. Event loop saturates, all API calls timeout, gateway gets kill/restarted by launchd in a loop

Observed in production

  • OpenClaw 2026.4.29 (a448042)
  • 65 exec-approval-followup tasks stuck in running since 2026-03-30 (33 days)
  • 2 cron email-watcher tasks also stuck
  • Channel reload deferred for 33,911,115ms (~9.4 hours)
  • Gateway forked 202 times by launchd due to health check failures
  • All API calls (Telegram, Feishu, QQ Bot) timing out due to event loop blockage

Root cause

In src/tasks/task-registry.maintenance.ts, the hasBackingSession function:

function hasBackingSession(task) {
  if (task.runtime === cron) { ... } // Fixed in #75307
  if (task.runtime === cli && hasActiveCliRun(task)) return true;
  const childSessionKey = task.childSessionKey?.trim();
  if (!childSessionKey) return true;
  // ...
  if (task.runtime === subagent || task.runtime === cli) {
    // For cli: checks findTaskSessionEntry which returns true
    // if the session key exists in the store.
    // agent:main:main is a persistent session — it always exists.
    return Boolean(entry);
  }
}

For exec-approval-followup tasks:

  • runtime = cli
  • childSessionKey = agent:main:main (the main session)
  • hasActiveCliRun returns false (the embedded run has ended)
  • findTaskSessionEntry returns true (main session always exists in store)
  • hasBackingSessiontrue
  • shouldMarkLostfalse
  • Task is never reconciled

Expected behavior

After the exec-approval-followup embedded run completes (success or failure), the task run should be marked as succeeded or failed within a reasonable time (e.g. the existing 5-minute grace period). It should not remain running forever just because the parent session still exists.

Suggested fix

The cli runtime path in hasBackingSession should not consider a task as backed merely because its childSessionKey session exists. A session existing does not mean the specific run is still active. Possible approaches:

  1. After hasActiveCliRun returns false for a cli task, return false immediately instead of falling through to the session existence check
  2. Or: track the specific sessionId (not just sessionKey) when creating the follow-up task, and check if that specific session instance is still active
  3. Or: add a max age for running tasks, after which they are automatically marked lost regardless of session state

Impact

  • 67 zombie tasks blocking the entire task queue
  • Channel reload permanently deferred
  • Event loop saturated (P99 delay > 40s)
  • All messaging channels (Telegram, Feishu, QQ Bot) non-functional
  • Gateway crash loop (202 restarts)
  • Requires manual intervention to clean up tasks/runs.json

Related

  • #75307 — same bug for cron runtime (fixed in v2026.4.29)
  • #59349 — exec follow-up leaking into new session after /new
  • #72143 — exec follow-up fallback retry/prefix issues

extent analysis

TL;DR

The most likely fix is to modify the hasBackingSession function to correctly handle cli runtime tasks by not considering a task as backed merely because its childSessionKey session exists.

Guidance

  • Review the hasBackingSession function in src/tasks/task-registry.maintenance.ts to understand the current logic and identify the need for a fix.
  • Consider implementing one of the suggested approaches:
    • Return false immediately after hasActiveCliRun returns false for a cli task.
    • Track the specific sessionId when creating the follow-up task and check if that session instance is still active.
    • Add a max age for running tasks, after which they are automatically marked lost regardless of session state.
  • Verify the fix by triggering exec approvals, letting the follow-up embedded runs fail, and checking if the tasks are correctly marked as succeeded or failed within a reasonable time.

Example

function hasBackingSession(task) {
  if (task.runtime === cli && !hasActiveCliRun(task)) {
    return false; // Return false immediately if hasActiveCliRun returns false
  }
  // ... rest of the function remains the same
}

Notes

The suggested fix aims to address the issue with cli runtime tasks, but it may not cover all possible scenarios. Additional testing and verification are necessary to ensure the fix works as expected.

Recommendation

Apply the workaround by modifying the hasBackingSession function to correctly handle cli runtime tasks, as this is the most direct way to address the issue and prevent tasks from remaining in a running state indefinitely.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After the exec-approval-followup embedded run completes (success or failure), the task run should be marked as succeeded or failed within a reasonable time (e.g. the existing 5-minute grace period). It should not remain running forever just because the parent session still exists.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug] exec-approval-followup tasks stuck in 'running' forever, blocking channel reload and saturating event loop [2 pull requests, 1 comments, 2 participants]