openclaw - ✅(Solved) Fix [Bug]: MCP subprocesses not cleaned up after isolated agent session completes (memory leak) [1 pull requests, 1 participants]

jinlin-teck · 2026-04-18T15:51:03Z

[openclaw] When using MCP servers e.g., Lightpanda configured via mcp.servers in OpenClaw, isolated agent sessions spawned via cron jobs do not properly termin… When using MCP servers (e.g., Lightpanda) configured via `mcp.servers` in OpenClaw, isolated agent sessions spawned via cron jobs do not properly terminate the MCP child process after the session completes. Each time a cron job runs, a **new** MCP subprocess is created, and the previous one is never cleaned up. Result: every cron execution leaves one orphaned MCP process. # PR #69276: fix(cron): dispose bundled-MCP subprocesses after isolated cron agent runs (#68623) - Repository: openclaw/openclaw - Author: stainlu - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/69276 ## Description (problem / solution / changelog) ## Summary - **Problem:** When `mcp.servers` is configured and a cron job spawns an isolated agent session that uses an MCP tool, the MCP child subprocess (e.g. Lightpanda) is spawned fresh every run and never terminated. Each cron fire leaks one MCP process. - **Why it matters:** A daily cron job that touches an MCP tool accumulates one orphaned process per day indefinitely. Reporter's Lightpanda workload demonstrates the leak on v2026.4.9 and the gap persists on current main. - **What changed:** Pass `cleanupBundleMcpOnRunEnd: true` into the `runEmbeddedPiAgent` call in the isolated-cron executor. The embedded runner already has the disposal hook (`disposeSessionMcpRuntime(params.sessionId)`) wired at `src/agents/pi-embedded-runner/run.ts:2047-2053`, but it is opt-in via `cleanupBundleMcpOnRunEnd`, and the cron executor never opted in. - **What did NOT change (scope boundary):** The CLI-provider cron path at `runCliAgent` (`src/agents/cli-runner.ts:123`) already disposes its prepared backend in its own `finally` block — that path was not leaking. This PR only plugs the gap in the embedded-Pi cron path. ## Change Type (select all) - [x] Bug fix ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Integrations ## Linked Issue/PR - Closes #68623 - [x] This PR fixes a bug or regression ## Root Cause - **Root cause:** `src/cron/isolated-agent/run-executor.ts:144` calls `runEmbeddedPiAgent({ ... })` without `cleanupBundleMcpOnRunEnd`. The dispose hook in the embedded runner is gated on that flag (default false), so for isolated cron turns the MCP runtime for that `sessionId` is never torn down. The subprocess spawned by bundled-MCP prep stays alive after the turn completes, and the next cron fire spawns a new one without disposing the previous. - **Missing detection / guardrail:** No regression test existed for the cron executor's MCP disposal wiring. The embedded-runner dispose path has coverage (`src/agents/pi-embedded-runner.e2e.test.ts:486-535`), but nothing asserted that the cron entrypoint opts into it. - **Contributing context:** The CLI-path cleanup in `src/agents/cli-runner.ts:123` works differently (it runs the `preparedBackend.cleanup` in a `finally`), so manual testing of CLI-backed cron runs could give a false sense of coverage. ## Regression Test Plan - Coverage level that should have caught this: - [x] Unit test - **Target test or file:** new `src/cron/isolated-agent/run.mcp-cleanup.test.ts` via the existing `run.test-harness.ts` - **Scenario the test should lock in:** `runCronIsolatedAgentTurn` must pass `cleanupBundleMcpOnRunEnd: true` (and `trigger: "cron"`) to `runEmbeddedPiAgent`. Captures the exact params object passed to the mocked embedded runner. - **Why this is the smallest reliable guardrail:** The bug is a single missing flag on a single call site. An assertion on the call args of `runEmbeddedPiAgentMock` is enough to catch any future regression that drops the flag. ## User-visible / Behavior Changes None user-visible. Removes a silent memory leak on cron + MCP workloads. ## Diagram ```text Before: isolated cron fire → runEmbeddedPiAgent({ ...no cleanupBundleMcpOnRunEnd }) → spawn MCP subprocess for sessionId → run completes → dispose hook gated on flag → no-op → next cron fire → new MCP subprocess, previous one orphaned After: isolated cron fire → runEmbeddedPiAgent({ ..., cleanupBundleMcpOnRunEnd: true }) → spawn MCP subprocess for sessionId → run completes → disposeSessionMcpRuntime(sessionId) → subprocess terminated → next cron fire → fresh session, clean slate ``` ## Security Impact - New permissions/capabilities? `No` - Secrets/tokens handling changed? `No` - New/changed network calls? `No` - Command/tool execution surface changed? `No` (subprocess lifecycle only) - Data access scope changed? `No` ## Repro + Verification ### Environment - OS: Linux (Debian-derived in reporter's case) - Runtime: OpenClaw gateway v2026.4.9 (bug persists on current main) - Model/provider: any embedded-Pi provider (non-CLI) - Integration: MCP server registered via `mcp.servers. .command` - Relevant config: `sessionTarget: "isolated"` cron with payload `kind: "agentTurn"` tha

openclaw2026-04-18 15:51:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#68623•Fetched 2026-04-19 15:09:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jinlin-teck

Participants

jinlin-teck

Timeline (top)

labeled ×2

When using MCP servers (e.g., Lightpanda) configured via mcp.servers in OpenClaw, isolated agent sessions spawned via cron jobs do not properly terminate the MCP child process after the session completes. Each time a cron job runs, a new MCP subprocess is created, and the previous one is never cleaned up.

Result: every cron execution leaves one orphaned MCP process.

Root Cause

Result: every cron execution leaves one orphaned MCP process.

PR fix notes

PR #69276: fix(cron): dispose bundled-MCP subprocesses after isolated cron agent runs (#68623)

Repository: openclaw/openclaw
Author: stainlu
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/69276

Description (problem / solution / changelog)

Summary

Problem: When mcp.servers is configured and a cron job spawns an isolated agent session that uses an MCP tool, the MCP child subprocess (e.g. Lightpanda) is spawned fresh every run and never terminated. Each cron fire leaks one MCP process.
Why it matters: A daily cron job that touches an MCP tool accumulates one orphaned process per day indefinitely. Reporter's Lightpanda workload demonstrates the leak on v2026.4.9 and the gap persists on current main.
What changed: Pass cleanupBundleMcpOnRunEnd: true into the runEmbeddedPiAgent call in the isolated-cron executor. The embedded runner already has the disposal hook (disposeSessionMcpRuntime(params.sessionId)) wired at src/agents/pi-embedded-runner/run.ts:2047-2053, but it is opt-in via cleanupBundleMcpOnRunEnd, and the cron executor never opted in.
What did NOT change (scope boundary): The CLI-provider cron path at runCliAgent (src/agents/cli-runner.ts:123) already disposes its prepared backend in its own finally block — that path was not leaking. This PR only plugs the gap in the embedded-Pi cron path.

Change Type (select all)

Bug fix

Scope (select all touched areas)

Gateway / orchestration
Integrations

Linked Issue/PR

Closes #68623
This PR fixes a bug or regression

Root Cause

Root cause: src/cron/isolated-agent/run-executor.ts:144 calls runEmbeddedPiAgent({ ... }) without cleanupBundleMcpOnRunEnd. The dispose hook in the embedded runner is gated on that flag (default false), so for isolated cron turns the MCP runtime for that sessionId is never torn down. The subprocess spawned by bundled-MCP prep stays alive after the turn completes, and the next cron fire spawns a new one without disposing the previous.
Missing detection / guardrail: No regression test existed for the cron executor's MCP disposal wiring. The embedded-runner dispose path has coverage (src/agents/pi-embedded-runner.e2e.test.ts:486-535), but nothing asserted that the cron entrypoint opts into it.
Contributing context: The CLI-path cleanup in src/agents/cli-runner.ts:123 works differently (it runs the preparedBackend.cleanup in a finally), so manual testing of CLI-backed cron runs could give a false sense of coverage.

Regression Test Plan

Coverage level that should have caught this:
- Unit test
Target test or file: new src/cron/isolated-agent/run.mcp-cleanup.test.ts via the existing run.test-harness.ts
Scenario the test should lock in: runCronIsolatedAgentTurn must pass cleanupBundleMcpOnRunEnd: true (and trigger: "cron") to runEmbeddedPiAgent. Captures the exact params object passed to the mocked embedded runner.
Why this is the smallest reliable guardrail: The bug is a single missing flag on a single call site. An assertion on the call args of runEmbeddedPiAgentMock is enough to catch any future regression that drops the flag.

User-visible / Behavior Changes

None user-visible. Removes a silent memory leak on cron + MCP workloads.

Diagram

Before:
isolated cron fire → runEmbeddedPiAgent({ ...no cleanupBundleMcpOnRunEnd })
                   → spawn MCP subprocess for sessionId
                   → run completes
                   → dispose hook gated on flag → no-op
                   → next cron fire → new MCP subprocess, previous one orphaned

After:
isolated cron fire → runEmbeddedPiAgent({ ..., cleanupBundleMcpOnRunEnd: true })
                   → spawn MCP subprocess for sessionId
                   → run completes
                   → disposeSessionMcpRuntime(sessionId) → subprocess terminated
                   → next cron fire → fresh session, clean slate

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No (subprocess lifecycle only)
Data access scope changed? No

Repro + Verification

Environment

OS: Linux (Debian-derived in reporter's case)
Runtime: OpenClaw gateway v2026.4.9 (bug persists on current main)
Model/provider: any embedded-Pi provider (non-CLI)
Integration: MCP server registered via mcp.servers.<name>.command
Relevant config: sessionTarget: "isolated" cron with payload kind: "agentTurn" that calls an MCP tool

Steps

Configure mcp.servers.lightpanda with a command pointing at a Lightpanda binary.
Register a cron job with sessionTarget: "isolated" whose payload calls a Lightpanda MCP tool.
Let the cron fire a few times.
Observe with ps or pgrep lightpanda that one Lightpanda process remains per cron fire, never cleaned up.

Expected

Each isolated cron turn disposes its MCP runtime on completion. pgrep lightpanda stays at 0 (or at the long-lived main-session count, unchanged across cron fires).

Actual (before fix)

One orphaned Lightpanda per cron fire, growing indefinitely until manual kill.

Evidence

Failing test/log before + passing after

New test run.mcp-cleanup.test.ts asserts runEmbeddedPiAgent receives cleanupBundleMcpOnRunEnd: true. Would have failed on current main.

Human Verification

Verified scenarios:
- pnpm tsgo clean on changed files
- pnpm test src/cron/isolated-agent/run.mcp-cleanup.test.ts — passes
- pnpm oxlint src/cron/isolated-agent/run-executor.ts src/cron/isolated-agent/run.mcp-cleanup.test.ts — 0 warnings, 0 errors
Edge cases checked:
- The CLI-provider cron path (earlier branch at line ~136 in the same file) already disposes via runCliAgent's own finally block; no change needed there.
- The dispose call swallows errors (.catch(log)), so a dispose failure never cascades into the agent turn result.
- cleanupBundleMcpOnRunEnd: true is the same flag the gateway openclaw agent command uses for local-only runs (src/commands/agent-via-gateway.ts:185), confirming the intended disposal semantic.
What I did NOT verify:
- Live Lightpanda-driven repro. The unit-level assertion + the traced dispose path is sufficient evidence; the hook is already exercised by existing e2e tests in src/agents/pi-embedded-runner.e2e.test.ts.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

No new config surface. A single run-time flag was flipped on an already-existing call path.

Risks and Mitigations

Risk: disposeSessionMcpRuntime could disrupt a shared MCP runtime that other concurrent sessions depend on.
- Mitigation: the dispose call is keyed on params.sessionId, which for isolated cron turns is uniquely owned by the turn's cron session entry (params.cronSession.sessionEntry.sessionId). It cannot collide with a main session's runtime. The same pattern is already used safely by the openclaw agent --local path.

Changed files

src/cron/isolated-agent/run-executor.ts (modified, +6/-0)
src/cron/isolated-agent/run.mcp-cleanup.test.ts (added, +99/-0)

Code Example

{
     "mcp": {
       "servers": {
         "lightpanda": {
           "command": "/home/jialin/lightpanda",
           "args": ["mcp"]
         }
       }
     }
   }

---

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Result: every cron execution leaves one orphaned MCP process.

Steps to reproduce

Environment

OpenClaw version: v2026.4.9
OS: Linux (Debian-based)
Lightpanda version: 1.0.0-nightly.5661+24b86a84
Node: v24.14.1

Steps to reproduce

Configure an MCP server in openclaw.json, e.g.:

{
  "mcp": {
    "servers": {
      "lightpanda": {
        "command": "/home/jialin/lightpanda",
        "args": ["mcp"]
      }
    }
  }
}


2. Create a cron job that uses an MCP tool (e.g., lightpanda__goto, lightpanda__markdown):
{
  "schedule": { "kind": "cron", "expr": "35 8 * * *", "tz": "Asia/Shanghai" },
  "sessionTarget": "isolated",
  "payload": { "kind": "agentTurn", "message": "Use lightpanda to fetch a webpage..." }
}
3.Let the cron job execute 2–3 times.
4.Run ps aux | grep lightpanda — a new lightpanda mcp process appears each time, none are cleaned up.
 

### Expected behavior

Only one lightpanda mcp subprocess should exist at any time (the one managed by the Gateway). After each isolated agent session completes, its MCP subprocess should be terminated.

### Actual behavior

$ ps aux | grep lightpanda
jialin 265023 ... /home/jialin/lightpanda mcp <- created at startup
jialin 265133 ... /home/jialin/lightpana mcp <- after cron #1
jialin 265259 ... /home/jialin/lightpanda mcp <- after cron #2
jialin 265309 ... /home/jialin/lightpanda mcp <- after cron #3

### OpenClaw version

 v2026.4.15

### Operating system

Debian 13

### Install method

npm global

### Model

minimax/M2.7

### Provider / routing chain

openclaw --> minimax

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The issue can be fixed by properly terminating the MCP child process after each isolated agent session completes, potentially by modifying the cron job or the OpenClaw configuration to handle process cleanup.

Guidance

Review the OpenClaw documentation to see if there's a built-in mechanism for handling subprocess termination, especially for isolated agent sessions.
Investigate the openclaw.json configuration file to determine if there are any settings related to process management that can be adjusted to prevent orphaned processes.
Consider modifying the cron job to include a step that explicitly terminates the MCP subprocess after the session completes, using a command like pkill or killall.
Check the Lightpanda version and OpenClaw version compatibility to ensure that the issue isn't due to a known version conflict.

Example

No specific code example can be provided without more details on the internal workings of OpenClaw and Lightpanda, but a potential solution might involve adding a cleanup script to the cron job:

# After the agent session completes
pkill -f "/home/jialin/lightpanda mcp"

Notes

The solution may depend on the specific implementation details of OpenClaw and Lightpanda, which are not fully provided in the issue description. Additionally, ensuring that the subprocess termination does not interfere with the normal operation of OpenClaw and Lightpanda is crucial.

Recommendation

Apply a workaround by modifying the cron job to include a process cleanup step, as the root cause seems to be related to how subprocesses are managed during isolated agent sessions. This approach allows for a targeted fix without requiring an immediate upgrade or significant changes to the OpenClaw or Lightpanda configurations.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Only one lightpanda mcp subprocess should exist at any time (the one managed by the Gateway). After each isolated agent session completes, its MCP subprocess should be terminated.

#retriever error #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: MCP subprocesses not cleaned up after isolated agent session completes (memory leak) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

PR fix notes

PR #69276: fix(cron): dispose bundled-MCP subprocesses after isolated cron agent runs (#68623)

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause

Regression Test Plan

User-visible / Behavior Changes

Diagram

Security Impact

Repro + Verification

Environment

Steps

Expected

Actual (before fix)

Evidence

Human Verification

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Environment

Steps to reproduce

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING