openclaw - ✅(Solved) Fix [Bug]: MCP subprocesses not cleaned up after isolated agent session completes (memory leak) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68623Fetched 2026-04-19 15:09:25
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

When using MCP servers (e.g., Lightpanda) configured via mcp.servers in OpenClaw, isolated agent sessions spawned via cron jobs do not properly terminate the MCP child process after the session completes. Each time a cron job runs, a new MCP subprocess is created, and the previous one is never cleaned up.

Result: every cron execution leaves one orphaned MCP process.

Root Cause

When using MCP servers (e.g., Lightpanda) configured via mcp.servers in OpenClaw, isolated agent sessions spawned via cron jobs do not properly terminate the MCP child process after the session completes. Each time a cron job runs, a new MCP subprocess is created, and the previous one is never cleaned up.

Result: every cron execution leaves one orphaned MCP process.

PR fix notes

PR #69276: fix(cron): dispose bundled-MCP subprocesses after isolated cron agent runs (#68623)

Description (problem / solution / changelog)

Summary

  • Problem: When mcp.servers is configured and a cron job spawns an isolated agent session that uses an MCP tool, the MCP child subprocess (e.g. Lightpanda) is spawned fresh every run and never terminated. Each cron fire leaks one MCP process.
  • Why it matters: A daily cron job that touches an MCP tool accumulates one orphaned process per day indefinitely. Reporter's Lightpanda workload demonstrates the leak on v2026.4.9 and the gap persists on current main.
  • What changed: Pass cleanupBundleMcpOnRunEnd: true into the runEmbeddedPiAgent call in the isolated-cron executor. The embedded runner already has the disposal hook (disposeSessionMcpRuntime(params.sessionId)) wired at src/agents/pi-embedded-runner/run.ts:2047-2053, but it is opt-in via cleanupBundleMcpOnRunEnd, and the cron executor never opted in.
  • What did NOT change (scope boundary): The CLI-provider cron path at runCliAgent (src/agents/cli-runner.ts:123) already disposes its prepared backend in its own finally block — that path was not leaking. This PR only plugs the gap in the embedded-Pi cron path.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway / orchestration
  • Integrations

Linked Issue/PR

  • Closes #68623
  • This PR fixes a bug or regression

Root Cause

  • Root cause: src/cron/isolated-agent/run-executor.ts:144 calls runEmbeddedPiAgent({ ... }) without cleanupBundleMcpOnRunEnd. The dispose hook in the embedded runner is gated on that flag (default false), so for isolated cron turns the MCP runtime for that sessionId is never torn down. The subprocess spawned by bundled-MCP prep stays alive after the turn completes, and the next cron fire spawns a new one without disposing the previous.
  • Missing detection / guardrail: No regression test existed for the cron executor's MCP disposal wiring. The embedded-runner dispose path has coverage (src/agents/pi-embedded-runner.e2e.test.ts:486-535), but nothing asserted that the cron entrypoint opts into it.
  • Contributing context: The CLI-path cleanup in src/agents/cli-runner.ts:123 works differently (it runs the preparedBackend.cleanup in a finally), so manual testing of CLI-backed cron runs could give a false sense of coverage.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: new src/cron/isolated-agent/run.mcp-cleanup.test.ts via the existing run.test-harness.ts
  • Scenario the test should lock in: runCronIsolatedAgentTurn must pass cleanupBundleMcpOnRunEnd: true (and trigger: "cron") to runEmbeddedPiAgent. Captures the exact params object passed to the mocked embedded runner.
  • Why this is the smallest reliable guardrail: The bug is a single missing flag on a single call site. An assertion on the call args of runEmbeddedPiAgentMock is enough to catch any future regression that drops the flag.

User-visible / Behavior Changes

None user-visible. Removes a silent memory leak on cron + MCP workloads.

Diagram

Before:
isolated cron fire → runEmbeddedPiAgent({ ...no cleanupBundleMcpOnRunEnd })
                   → spawn MCP subprocess for sessionId
                   → run completes
                   → dispose hook gated on flag → no-op
                   → next cron fire → new MCP subprocess, previous one orphaned

After:
isolated cron fire → runEmbeddedPiAgent({ ..., cleanupBundleMcpOnRunEnd: true })
                   → spawn MCP subprocess for sessionId
                   → run completes
                   → disposeSessionMcpRuntime(sessionId) → subprocess terminated
                   → next cron fire → fresh session, clean slate

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No (subprocess lifecycle only)
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux (Debian-derived in reporter's case)
  • Runtime: OpenClaw gateway v2026.4.9 (bug persists on current main)
  • Model/provider: any embedded-Pi provider (non-CLI)
  • Integration: MCP server registered via mcp.servers.<name>.command
  • Relevant config: sessionTarget: "isolated" cron with payload kind: "agentTurn" that calls an MCP tool

Steps

  1. Configure mcp.servers.lightpanda with a command pointing at a Lightpanda binary.
  2. Register a cron job with sessionTarget: "isolated" whose payload calls a Lightpanda MCP tool.
  3. Let the cron fire a few times.
  4. Observe with ps or pgrep lightpanda that one Lightpanda process remains per cron fire, never cleaned up.

Expected

Each isolated cron turn disposes its MCP runtime on completion. pgrep lightpanda stays at 0 (or at the long-lived main-session count, unchanged across cron fires).

Actual (before fix)

One orphaned Lightpanda per cron fire, growing indefinitely until manual kill.

Evidence

  • Failing test/log before + passing after

New test run.mcp-cleanup.test.ts asserts runEmbeddedPiAgent receives cleanupBundleMcpOnRunEnd: true. Would have failed on current main.

Human Verification

  • Verified scenarios:
    • pnpm tsgo clean on changed files
    • pnpm test src/cron/isolated-agent/run.mcp-cleanup.test.ts — passes
    • pnpm oxlint src/cron/isolated-agent/run-executor.ts src/cron/isolated-agent/run.mcp-cleanup.test.ts — 0 warnings, 0 errors
  • Edge cases checked:
    • The CLI-provider cron path (earlier branch at line ~136 in the same file) already disposes via runCliAgent's own finally block; no change needed there.
    • The dispose call swallows errors (.catch(log)), so a dispose failure never cascades into the agent turn result.
    • cleanupBundleMcpOnRunEnd: true is the same flag the gateway openclaw agent command uses for local-only runs (src/commands/agent-via-gateway.ts:185), confirming the intended disposal semantic.
  • What I did NOT verify:
    • Live Lightpanda-driven repro. The unit-level assertion + the traced dispose path is sufficient evidence; the hook is already exercised by existing e2e tests in src/agents/pi-embedded-runner.e2e.test.ts.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

No new config surface. A single run-time flag was flipped on an already-existing call path.

Risks and Mitigations

  • Risk: disposeSessionMcpRuntime could disrupt a shared MCP runtime that other concurrent sessions depend on.
    • Mitigation: the dispose call is keyed on params.sessionId, which for isolated cron turns is uniquely owned by the turn's cron session entry (params.cronSession.sessionEntry.sessionId). It cannot collide with a main session's runtime. The same pattern is already used safely by the openclaw agent --local path.

Changed files

  • src/cron/isolated-agent/run-executor.ts (modified, +6/-0)
  • src/cron/isolated-agent/run.mcp-cleanup.test.ts (added, +99/-0)

Code Example

{
     "mcp": {
       "servers": {
         "lightpanda": {
           "command": "/home/jialin/lightpanda",
           "args": ["mcp"]
         }
       }
     }
   }

---
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

When using MCP servers (e.g., Lightpanda) configured via mcp.servers in OpenClaw, isolated agent sessions spawned via cron jobs do not properly terminate the MCP child process after the session completes. Each time a cron job runs, a new MCP subprocess is created, and the previous one is never cleaned up.

Result: every cron execution leaves one orphaned MCP process.

Steps to reproduce

Environment

  • OpenClaw version: v2026.4.9
  • OS: Linux (Debian-based)
  • Lightpanda version: 1.0.0-nightly.5661+24b86a84
  • Node: v24.14.1

Steps to reproduce

  1. Configure an MCP server in openclaw.json, e.g.:
    {
      "mcp": {
        "servers": {
          "lightpanda": {
            "command": "/home/jialin/lightpanda",
            "args": ["mcp"]
          }
        }
      }
    }

2. Create a cron job that uses an MCP tool (e.g., lightpanda__goto, lightpanda__markdown):
{
  "schedule": { "kind": "cron", "expr": "35 8 * * *", "tz": "Asia/Shanghai" },
  "sessionTarget": "isolated",
  "payload": { "kind": "agentTurn", "message": "Use lightpanda to fetch a webpage..." }
}
3.Let the cron job execute 2–3 times.
4.Run ps aux | grep lightpanda — a new lightpanda mcp process appears each time, none are cleaned up.
 

### Expected behavior

Only one lightpanda mcp subprocess should exist at any time (the one managed by the Gateway). After each isolated agent session completes, its MCP subprocess should be terminated.

### Actual behavior

$ ps aux | grep lightpanda
jialin 265023 ... /home/jialin/lightpanda mcp <- created at startup
jialin 265133 ... /home/jialin/lightpana mcp <- after cron #1
jialin 265259 ... /home/jialin/lightpanda mcp <- after cron #2
jialin 265309 ... /home/jialin/lightpanda mcp <- after cron #3

### OpenClaw version

 v2026.4.15

### Operating system

Debian 13

### Install method

npm global

### Model

minimax/M2.7

### Provider / routing chain

openclaw --> minimax

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The issue can be fixed by properly terminating the MCP child process after each isolated agent session completes, potentially by modifying the cron job or the OpenClaw configuration to handle process cleanup.

Guidance

  • Review the OpenClaw documentation to see if there's a built-in mechanism for handling subprocess termination, especially for isolated agent sessions.
  • Investigate the openclaw.json configuration file to determine if there are any settings related to process management that can be adjusted to prevent orphaned processes.
  • Consider modifying the cron job to include a step that explicitly terminates the MCP subprocess after the session completes, using a command like pkill or killall.
  • Check the Lightpanda version and OpenClaw version compatibility to ensure that the issue isn't due to a known version conflict.

Example

No specific code example can be provided without more details on the internal workings of OpenClaw and Lightpanda, but a potential solution might involve adding a cleanup script to the cron job:

# After the agent session completes
pkill -f "/home/jialin/lightpanda mcp"

Notes

The solution may depend on the specific implementation details of OpenClaw and Lightpanda, which are not fully provided in the issue description. Additionally, ensuring that the subprocess termination does not interfere with the normal operation of OpenClaw and Lightpanda is crucial.

Recommendation

Apply a workaround by modifying the cron job to include a process cleanup step, as the root cause seems to be related to how subprocesses are managed during isolated agent sessions. This approach allows for a targeted fix without requiring an immediate upgrade or significant changes to the OpenClaw or Lightpanda configurations.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Only one lightpanda mcp subprocess should exist at any time (the one managed by the Gateway). After each isolated agent session completes, its MCP subprocess should be terminated.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: MCP subprocesses not cleaned up after isolated agent session completes (memory leak) [1 pull requests, 1 participants]