openclaw - 💡(How to fix) Fix [Bug]: ACP `mode: "run"` sessions leave orphan processes and lose completion events [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52708Fetched 2026-04-08 01:20:01
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Timeline (top)
labeled ×2commented ×1cross-referenced ×1

When using sessions_spawn with runtime: "acp" and mode: "run", two issues occur after the agent finishes its task:

  1. Orphan processes: The claude-agent-acp and claude child processes are never terminated, becoming orphaned under systemd --user and leaking memory indefinitely.
  2. Missing completion event: The parent session never receives the run completion system event, so the orchestrating agent has no way to know the task finished.

Both issues reproduce consistently (observed on 3 separate mode: "run" sessions over 2 days).

Error Message

When the ACP run session completes (either successfully or via timeout/error), the acpx runtime should:

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

  • Memory leak: Each forgotten run leaks ~300-600 MB of RSS. On constrained VMs (3.8 GB), this leads to OOM within 2-3 runs.
  • Broken orchestration: Without completion events, the PM/orchestrator agent cannot automate verification and cleanup workflows. Users must manually check if tasks are done.
  • Workaround burden: Users must run pkill -f claude-agent-acp && pkill -f claude after every run, which is a sledgehammer that kills ALL claude processes (including any legitimately active ones).

Code Example

sessions_spawn({
     agentId: "claude",
     runtime: "acp",
     mode: "run",
     task: "<any multi-minute coding task>",
     runTimeoutSeconds: 1800
   })

---

Before (normal):                          After (broken):
openclaw-gateway                          systemd --user (PID 920)
  └─ acpx                                   ├─ claude-agent-acp (orphan, Sl, ep_poll)
      └─ queue-owner                         │   └─ claude (orphan, Sl, ep_poll)
          └─ npm exec                        └─ claude-agent-acp (orphan, Sl, ep_poll)
              └─ sh                              └─ claude (orphan, Sl, ep_poll)
                  ├─ claude-agent-acp
                  │   └─ claude
                  └─ claude-agent-acp
                      └─ claude

---

$ ps -eo pid,ppid,stat,etime,rss,cmd | grep claude-agent-acp
564895  920 Sl  01:06:35  80144 node .../claude-agent-acp
564906  564895 Sl  01:06:35  201520 claude
564989  920 Sl  01:06:31  85392 node .../claude-agent-acp
565045  564989 Sl  01:06:30  267596 claude

$ # All intermediate processes are DEAD:
$ ps -p 564952,564965,564976,564988
# (no output - all exited)

$ # Processes have zero CPU activity and no network connections:
$ ss -tnp | grep -E "564895|564906|564989|565045"
# (no output)

---
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

When using sessions_spawn with runtime: "acp" and mode: "run", two issues occur after the agent finishes its task:

  1. Orphan processes: The claude-agent-acp and claude child processes are never terminated, becoming orphaned under systemd --user and leaking memory indefinitely.
  2. Missing completion event: The parent session never receives the run completion system event, so the orchestrating agent has no way to know the task finished.

Both issues reproduce consistently (observed on 3 separate mode: "run" sessions over 2 days).

Steps to reproduce

Environment

  • OpenClaw: 2026.3.11 (29dc654)
  • acpx extension: 2026.3.11
  • acpx CLI: 0.1.16
  • Agent harness: claude (Claude Code via claude-agent-acp)
  • OS: Linux 6.14.0-1017-azure (Ubuntu 24.04, x86_64)
  • Node: v25.8.1
  • VM memory: 3.8 GB

Steps to Reproduce

  1. From the main agent session, spawn an ACP run:
    sessions_spawn({
      agentId: "claude",
      runtime: "acp",
      mode: "run",
      task: "<any multi-minute coding task>",
      runTimeoutSeconds: 1800
    })
  2. Wait for the agent to complete its work (it will git commit, etc.).
  3. Check subagents list — returns 0 active, 0 recent.
  4. Check OS processes — claude-agent-acp and claude are still alive.

Expected behavior

  1. After the ACP run completes, all child processes (claude-agent-acp, claude, and any intermediaries) should be terminated.
  2. The parent session should receive a completion system event so the orchestrating agent can proceed with verification and cleanup.

Actual behavior

Bug 1: Orphan Processes The intermediate process chain (acpxqueue-ownernpm execsh) exits after the run completes, but does not send SIGTERM to its child process tree before exiting. The claude-agent-acp processes get reparented to PID 1 (systemd --user) and remain alive indefinitely.

Process tree after run completion:

Before (normal):                          After (broken):
openclaw-gateway                          systemd --user (PID 920)
  └─ acpx                                   ├─ claude-agent-acp (orphan, Sl, ep_poll)
      └─ queue-owner                         │   └─ claude (orphan, Sl, ep_poll)
          └─ npm exec                        └─ claude-agent-acp (orphan, Sl, ep_poll)
              └─ sh                              └─ claude (orphan, Sl, ep_poll)
                  ├─ claude-agent-acp
                  │   └─ claude
                  └─ claude-agent-acp
                      └─ claude

Diagnostic data from a real session:

$ ps -eo pid,ppid,stat,etime,rss,cmd | grep claude-agent-acp
564895  920 Sl  01:06:35  80144 node .../claude-agent-acp
564906  564895 Sl  01:06:35  201520 claude
564989  920 Sl  01:06:31  85392 node .../claude-agent-acp
565045  564989 Sl  01:06:30  267596 claude

$ # All intermediate processes are DEAD:
$ ps -p 564952,564965,564976,564988
# (no output - all exited)

$ # Processes have zero CPU activity and no network connections:
$ ss -tnp | grep -E "564895|564906|564989|565045"
# (no output)

Each orphan pair consumes ~300 MB RSS. On a 3.8 GB VM, 2-3 forgotten runs can OOM the machine.

Bug 2: Missing Completion Event

The parent (main) session never receives the run completion event. Evidence:

  • subagents list returns { total: 0, active: [], recent: [] } — the gateway has no record of the run.
  • Searching the parent session transcript for completion/system events related to the child session key returns nothing.
  • The agent had to be manually checked via git log and ps aux to discover the task had completed ~46 minutes earlier.

This has occurred on every mode: "run" ACP session observed (3/3 attempts over 2 days, including tasks that completed successfully and committed code).

Root Cause Analysis

Orphan processes

In extensions/acpx/src/runtime-internals/process.ts, spawnWithResolvedCommand() spawns the child process without detached: true, so the child inherits the parent's process group. However, when the intermediate acpx/queue-owner processes exit, they don't call child.kill() or process.kill(-pgid) before terminating. The OS reparents the children to init/systemd.

The spawnAndCollect() function does have abort-based cleanup (SIGTERM → SIGKILL), but this only triggers on explicit AbortSignal — it's not wired to run-completion or session-close lifecycle hooks.

OpenClaw version

2026.3.11 (29dc654)

Operating system

Linux 6.14.0-1017-azure (Ubuntu 24.04, x86_64)

Install method

No response

Model

claude-opus-4.6

Provider / routing chain

openclaw --> acpx --> claude cli

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

  • Memory leak: Each forgotten run leaks ~300-600 MB of RSS. On constrained VMs (3.8 GB), this leads to OOM within 2-3 runs.
  • Broken orchestration: Without completion events, the PM/orchestrator agent cannot automate verification and cleanup workflows. Users must manually check if tasks are done.
  • Workaround burden: Users must run pkill -f claude-agent-acp && pkill -f claude after every run, which is a sledgehammer that kills ALL claude processes (including any legitimately active ones).

Additional information

Root Cause Analysis

Orphan processes

In extensions/acpx/src/runtime-internals/process.ts, spawnWithResolvedCommand() spawns the child process without detached: true, so the child inherits the parent's process group. However, when the intermediate acpx/queue-owner processes exit, they don't call child.kill() or process.kill(-pgid) before terminating. The OS reparents the children to init/systemd.

The spawnAndCollect() function does have abort-based cleanup (SIGTERM → SIGKILL), but this only triggers on explicit AbortSignal — it's not wired to run-completion or session-close lifecycle hooks.

Missing completion event

The close() method in runtime.ts runs acpx sessions close <name>, which closes the ACP-level session. However, the completion event doesn't appear to propagate back to the gateway's session routing layer. The acpx sessions list shows "No sessions" after close, but the parent session was never notified.

Suggested Fix

For orphan processes

When the ACP run session completes (either successfully or via timeout/error), the acpx runtime should:

  1. Send SIGTERM to the spawned child process tree before the intermediate processes exit.
  2. Wait briefly (e.g., 250ms) for graceful shutdown, then SIGKILL if still alive.
  3. Consider spawning with detached: true + process.kill(-child.pid) for reliable process group cleanup.

For missing completion event

Ensure the close() or run-completion path emits a completion event back to the parent session via the gateway's session event bus before tearing down the ACP session state.

extent analysis

Fix Plan

To address the issues of orphan processes and missing completion events, the following steps can be taken:

For Orphan Processes

  1. Modify spawnWithResolvedCommand(): Set detached: true when spawning the child process to prevent it from inheriting the parent's process group.
  2. Implement Cleanup: Before the intermediate processes exit, send SIGTERM to the spawned child process tree, wait briefly for graceful shutdown, and then send SIGKILL if the processes are still alive.
  3. Example Code:
    const childProcess = spawnWithResolvedCommand({
      // ... other options ...
      detached: true,
      stdio: 'pipe'
    });
    
    // Before exiting intermediate processes
    childProcess.on('close', () => {
      // Send SIGTERM to child process tree
      process.kill(-childProcess.pid, 'SIGTERM');
      setTimeout(() => {
        // Send SIGKILL if still alive
        process.kill(-childProcess.pid, 'SIGKILL');
      }, 250);
    });

For Missing Completion Event

  1. Emit Completion Event: Modify the close() method in runtime.ts to emit a completion event back to the parent session via the gateway's session event bus before closing the ACP session.
  2. Example Code:
    close() {
      // ... other cleanup code ...
      // Emit completion event
      this.gateway.emit('session-completion', this.sessionId);
      // Close ACP session
      acpx.sessions.close(this.sessionId);
    }

Verification

To verify that the fixes work:

  1. Run a test session with the modified code.
  2. Check for orphan processes after the session completes using ps -eo pid,ppid,stat,etime,rss,cmd | grep claude-agent-acp.
  3. Verify that the parent session receives the completion event by checking the session transcript or logs.

Extra Tips

  • Regularly review and test process cleanup mechanisms to prevent regressions.
  • Consider implementing a periodic cleanup task to remove any remaining orphan processes.
  • Ensure that all error cases are handled properly to prevent unexpected behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  1. After the ACP run completes, all child processes (claude-agent-acp, claude, and any intermediaries) should be terminated.
  2. The parent session should receive a completion system event so the orchestrating agent can proceed with verification and cleanup.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING