openclaw - 💡(How to fix) Fix [Bug]: ACP oneshot sessions leave orphaned processes — session reset does not clean up child ACP session keys [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68916Fetched 2026-04-19 15:06:25
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

When sessions_spawn creates ACP sessions with mode: "run", the spawned claude-agent-acp + claude processes remain alive indefinitely after the run completes. Additionally, when the parent session is reset via /new or /reset, the child ACP sessions are not cleaned up — the gateway only calls closeAcpRuntimeForSession() for the parent session key, not for child session keys spawned from it.

This results in progressive memory exhaustion. On a 16 GB machine, 11 orphaned pairs accumulated ~3.5 GB within 2 hours of normal use.

Error Message

} catch (error) { If runtime.close() fails (e.g., stdin already closed, connection lost, timeout), the error is logged but no PID-level fallback kill is attempted. The process remains alive.

Root Cause

Two independent gaps in the cleanup chain:

Gap 1: Oneshot close failure is silently swallowed

In the ACP session manager runTurn(), the oneshot cleanup runs in a finally block:

if (meta.mode === "oneshot") {
  try {
    await runtime.close({ handle, reason: "oneshot-complete" });
  } catch (error) {
    logVerbose(`acp-manager: ACP oneshot close failed...`);
  }
}

If runtime.close() fails (e.g., stdin already closed, connection lost, timeout), the error is logged but no PID-level fallback kill is attempted. The process remains alive.

Gap 2: Parent session reset does not clean up child ACP sessions

performGatewaySessionReset() calls cleanupSessionBeforeMutation() which calls closeAcpRuntimeForSession() — but only for the parent session key. It does not:

  • Query the store for child sessions spawned by this parent
  • Iterate child ACP session keys to close their runtimes
  • Kill processes associated with child ACP sessions

stopSubagentsForRequester() is called inside ensureSessionRuntimeCleanup(), but it only handles subagent-type sessions, not ACP sessions. The child ACP session entries have spawnedBy / parentSessionKey set, but these are never queried during parent cleanup.

Code Example

sessions_spawn({ runtime: "acp", mode: "run", task: "..." })

---

if (meta.mode === "oneshot") {
  try {
    await runtime.close({ handle, reason: "oneshot-complete" });
  } catch (error) {
    logVerbose(`acp-manager: ACP oneshot close failed...`);
  }
}

---

async terminateAgentProcess(child) {
  // ... existing SIGTERM/SIGKILL logic ...
  if (isChildProcessRunning(child)) {
    // Last resort: kill by PID directly
    try { process.kill(child.pid, "SIGKILL"); } catch {}
  }
}

---

// After closing parent ACP runtime:
const childAcpKeys = listAcpSessionEntries({ cfg })
  .filter(e => e.entry?.parentSessionKey === canonicalKey 
            || e.entry?.spawnedBy === canonicalKey);

for (const child of childAcpKeys) {
  await closeAcpRuntimeForSession({
    cfg, sessionKey: child.key, entry: child.entry, reason: params.reason
  });
}

---

// In gateway startup sequence:
const activePids = getActiveAcpSessionPids(); // from session records
const osPids = findProcessesByName("claude-agent-acp");
for (const pid of osPids) {
  if (!activePids.has(pid)) {
    process.kill(pid, "SIGTERM");
  }
}

---

$ ps -eo pid,ppid,lstart,rss,args | grep claude-agent-acp
# 11 pairs, all with PPID=systemd (orphaned), started at regular intervals
# Total RSS: ~3.5 GB

$ # All pairs have identical parent structure:
# node claude-agent-acp → claude
# PPID is systemd --user (reparented after intermediate processes exited)
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

When sessions_spawn creates ACP sessions with mode: "run", the spawned claude-agent-acp + claude processes remain alive indefinitely after the run completes. Additionally, when the parent session is reset via /new or /reset, the child ACP sessions are not cleaned up — the gateway only calls closeAcpRuntimeForSession() for the parent session key, not for child session keys spawned from it.

This results in progressive memory exhaustion. On a 16 GB machine, 11 orphaned pairs accumulated ~3.5 GB within 2 hours of normal use.

Related issues

This is closely related to #52708, #61895, and #35886 but identifies a specific additional gap: the parent session reset path does not enumerate and close child ACP sessions.

Steps to reproduce

  1. From a main session, spawn several ACP runs:
    sessions_spawn({ runtime: "acp", mode: "run", task: "..." })
  2. Wait for all runs to complete (or not — the behavior is the same).
  3. Reset the parent session via /new or /reset.
  4. Check OS processes: all claude-agent-acp + claude pairs are still alive, reparented to systemd.

Root cause analysis

Two independent gaps in the cleanup chain:

Gap 1: Oneshot close failure is silently swallowed

In the ACP session manager runTurn(), the oneshot cleanup runs in a finally block:

if (meta.mode === "oneshot") {
  try {
    await runtime.close({ handle, reason: "oneshot-complete" });
  } catch (error) {
    logVerbose(`acp-manager: ACP oneshot close failed...`);
  }
}

If runtime.close() fails (e.g., stdin already closed, connection lost, timeout), the error is logged but no PID-level fallback kill is attempted. The process remains alive.

Gap 2: Parent session reset does not clean up child ACP sessions

performGatewaySessionReset() calls cleanupSessionBeforeMutation() which calls closeAcpRuntimeForSession() — but only for the parent session key. It does not:

  • Query the store for child sessions spawned by this parent
  • Iterate child ACP session keys to close their runtimes
  • Kill processes associated with child ACP sessions

stopSubagentsForRequester() is called inside ensureSessionRuntimeCleanup(), but it only handles subagent-type sessions, not ACP sessions. The child ACP session entries have spawnedBy / parentSessionKey set, but these are never queried during parent cleanup.

Suggested fix

Fix 1: PID-level fallback in terminateAgentProcess()

After close() → SIGTERM → SIGKILL sequence, if the process still has not exited, store the PID in the session record and use process.kill(pid, "SIGKILL") as a last resort:

async terminateAgentProcess(child) {
  // ... existing SIGTERM/SIGKILL logic ...
  if (isChildProcessRunning(child)) {
    // Last resort: kill by PID directly
    try { process.kill(child.pid, "SIGKILL"); } catch {}
  }
}

Fix 2: Enumerate and close child ACP sessions on parent reset

In cleanupSessionBeforeMutation(), after handling the parent session, query the store for child ACP sessions:

// After closing parent ACP runtime:
const childAcpKeys = listAcpSessionEntries({ cfg })
  .filter(e => e.entry?.parentSessionKey === canonicalKey 
            || e.entry?.spawnedBy === canonicalKey);

for (const child of childAcpKeys) {
  await closeAcpRuntimeForSession({
    cfg, sessionKey: child.key, entry: child.entry, reason: params.reason
  });
}

Fix 3: Gateway startup orphan reaper

On gateway startup, scan for claude-agent-acp processes whose session keys no longer exist in the store, and kill them:

// In gateway startup sequence:
const activePids = getActiveAcpSessionPids(); // from session records
const osPids = findProcessesByName("claude-agent-acp");
for (const pid of osPids) {
  if (!activePids.has(pid)) {
    process.kill(pid, "SIGTERM");
  }
}

Expected behavior

  1. After a mode: "run" ACP session completes, all child processes should be terminated.
  2. When the parent session is reset, all child ACP sessions should be closed and their processes terminated.
  3. On gateway startup, orphaned ACP processes from previous runs should be reaped.

Actual behavior

Child processes accumulate indefinitely. Each orphaned pair consumes ~300 MB RSS. subagents list shows 0 entries because the subagent registry does not track ACP sessions, making the leak invisible to the orchestrating agent.

OpenClaw version

2026.4.14

Operating system

Linux (arm64)

Logs, screenshots, and evidence

$ ps -eo pid,ppid,lstart,rss,args | grep claude-agent-acp
# 11 pairs, all with PPID=systemd (orphaned), started at regular intervals
# Total RSS: ~3.5 GB

$ # All pairs have identical parent structure:
# node claude-agent-acp → claude
# PPID is systemd --user (reparented after intermediate processes exited)

Impact and severity

On memory-constrained machines (≤16 GB), this can exhaust available RAM within a few hours of normal use with ACP agents, causing OOM or severe swap thrashing.

extent analysis

TL;DR

Implementing a PID-level fallback kill in terminateAgentProcess() and enumerating child ACP sessions on parent reset can help mitigate the memory exhaustion issue caused by orphaned claude-agent-acp and claude processes.

Guidance

  • Modify the terminateAgentProcess() function to include a PID-level fallback kill using process.kill(pid, "SIGKILL") as a last resort after the close() → SIGTERM → SIGKILL sequence.
  • Update cleanupSessionBeforeMutation() to query the store for child ACP sessions spawned by the parent session and close their runtimes using closeAcpRuntimeForSession().
  • Consider implementing a gateway startup orphan reaper to scan for and kill claude-agent-acp processes whose session keys no longer exist in the store.
  • Verify the fix by checking the OS processes after a mode: "run" ACP session completes and after the parent session is reset, ensuring that all child processes are terminated.

Example

async terminateAgentProcess(child) {
  // ... existing SIGTERM/SIGKILL logic ...
  if (isChildProcessRunning(child)) {
    // Last resort: kill by PID directly
    try { process.kill(child.pid, "SIGKILL"); } catch {}
  }
}

// In cleanupSessionBeforeMutation():
const childAcpKeys = listAcpSessionEntries({ cfg })
  .filter(e => e.entry?.parentSessionKey === canonicalKey 
            || e.entry?.spawnedBy === canonicalKey);

for (const child of childAcpKeys) {
  await closeAcpRuntimeForSession({
    cfg, sessionKey: child.key, entry: child.entry, reason: params.reason
  });
}

Notes

The provided fixes assume that the terminateAgentProcess() and cleanupSessionBeforeMutation() functions are correctly implemented and that the closeAcpRuntimeForSession() function effectively closes the ACP runtime for a given session key. Additional logging and error handling may be necessary to ensure the fixes are working as expected.

Recommendation

Apply the suggested fixes, including the PID-level fallback kill and child ACP session enumeration, to mitigate the memory exhaustion issue. This approach addresses the identified gaps in the cleanup chain and should help prevent progressive memory exhaustion.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  1. After a mode: "run" ACP session completes, all child processes should be terminated.
  2. When the parent session is reset, all child ACP sessions should be closed and their processes terminated.
  3. On gateway startup, orphaned ACP processes from previous runs should be reaped.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: ACP oneshot sessions leave orphaned processes — session reset does not clean up child ACP session keys [1 participants]