- MCP stdio servers should either be reused safely or terminated after use - completed turns should not leave stale MCP child processes behind indefinitely - removing `mcp.servers` from config should stop and reap previously managed MCP child processes

openclaw - ✅(Solved) Fix MCP stdio servers accumulate across turns and are not cleaned up on config reload (memory leak) [1 pull requests, 1 participants]

xieyuanqing · 2026-04-04T03:04:06Z

[openclaw] On OpenClaw 2026.4.2 , MCP stdio servers appear to accumulate across turns and are not cleaned up properly. Observed servers: - @modelcontextprotoco… On OpenClaw `2026.4.2`, MCP stdio servers appear to accumulate across turns and are not cleaned up properly. Observed servers: - `@modelcontextprotocol/server-sequential-thinking` - `@upstash/context7-mcp` - `mcp-deepwiki` / `mcp-instruct` This caused repeated memory growth on a 15 GiB VPS until restart. # PR #64316: fix(agents): release bundle MCP runtime on mid-run session reset - Repository: openclaw/openclaw - Author: xxxxxmax - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/64316 ## Description (problem / solution / changelog) ## Summary - Problem: `resetReplyRunSession` rotated the active `sessionId` after auto-compaction failure, context overflow, or role-ordering conflicts, but it never released the previous session id's entry from the bundle MCP runtime cache. - Why it matters: `runtimesBySessionId` in `src/agents/pi-bundle-mcp-runtime.ts` holds `Client` -> `Transport` -> stdio `ChildProcess` references with no TTL/LRU, so old MCP workers like `gate-mcp`, `bnbchain-mcp`, and `chrome-devtools-mcp` stayed alive and accumulated until memory-constrained deployments OOM'd. - What changed: `resetReplyRunSession` now disposes the previous session id's bundle MCP runtime in the background with `void ... .catch(...)` after the replacement session is fully established, and logs disposal failures through the existing `deps.error(...)` seam without blocking the retry path. - What did NOT change (scope boundary): this does not alter how the new session is established, and it does not add broader cache eviction or runtime lifecycle changes outside the session-reset path. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes # - Related #64169 - Related #60656 - Related #62026 - Related #62731 - [x] This PR fixes a bug or regression ## Root Cause (if applicable) - Root cause: mid-run session rotation persisted the new session metadata and optionally deleted the old transcript, but it never released the previous session id from the bundle MCP runtime cache. - Missing detection / guardrail: the reset-path tests did not assert MCP runtime disposal or cover disposal failures. - Contributing context (if known): the cache has no TTL/LRU and holds stdio-backed MCP client/process references, so each reset could strand another worker pool. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [x] Unit test - [ ] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: `src/auto-reply/reply/agent-runner-session-reset.test.ts` - Scenario the test should lock in: resetting a reply run session disposes the previous session runtime, still succeeds when disposal throws, and still deletes the old transcript when `cleanupTranscripts: true` even if disposal fails. - Why this is the smallest reliable guardrail: the behavior is owned by `resetReplyRunSession`, and the relevant failure handling is exposed through that helper's dependency seam. - Existing test that already covers this (if any): none before this change. - If no new test is added, why not: N/A ## User-visible / Behavior Changes None. ## Diagram (if applicable) ```text Before: [reset trigger] -> [new session metadata persisted] -> [old MCP runtime remains cached/alive] After: [reset trigger] -> [new session metadata persisted] -> [background dispose of old MCP runtime] -> [retry continues] ``` ## Security Impact (required) - New permissions/capabilities? (No) - Secrets/tokens handling changed? (No) - New/changed network calls? (No) - Command/tool execution surface changed? (No) - Data access scope changed? (No) - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: Linux - Runtime/container: local repo checkout - Model/provider: N/A - Integration/channel (if any): MCP stdio runtime path - Relevant config (redacted): N/A ### Steps 1. Trigger `resetReplyRunSession` from an auto-compaction failure, context overflow, or role-ordering conflict while a bundle MCP runtime is cached for the current session id. 2. Observe that the helper rotates `sessionId`, persists the replacement session metadata, and optionally deletes old transcript files. 3. Verify that the previous session id's MCP runtime is disposed in the background and that disposal failures are logged without preventing the reset from succeeding. ### Expected - Old session MCP workers are released when the session resets. - Reset still returns success even if runtime disposal thro

openclaw2026-04-04 03:04:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#60656•Fetched 2026-04-08 02:48:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

xieyuanqing

Participants

xieyuanqing

On OpenClaw 2026.4.2, MCP stdio servers appear to accumulate across turns and are not cleaned up properly.

Observed servers:

@modelcontextprotocol/server-sequential-thinking
@upstash/context7-mcp
mcp-deepwiki / mcp-instruct

This caused repeated memory growth on a 15 GiB VPS until restart.

Root Cause

and process count was still non-zero because old instances were not reaped by reload.

Fix Action

Fix / Workaround

Before mitigation

PR fix notes

PR #64316: fix(agents): release bundle MCP runtime on mid-run session reset

Repository: openclaw/openclaw
Author: xxxxxmax
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/64316

Description (problem / solution / changelog)

Summary

Problem: resetReplyRunSession rotated the active sessionId after auto-compaction failure, context overflow, or role-ordering conflicts, but it never released the previous session id's entry from the bundle MCP runtime cache.
Why it matters: runtimesBySessionId in src/agents/pi-bundle-mcp-runtime.ts holds Client -> Transport -> stdio ChildProcess references with no TTL/LRU, so old MCP workers like gate-mcp, bnbchain-mcp, and chrome-devtools-mcp stayed alive and accumulated until memory-constrained deployments OOM'd.
What changed: resetReplyRunSession now disposes the previous session id's bundle MCP runtime in the background with void ... .catch(...) after the replacement session is fully established, and logs disposal failures through the existing deps.error(...) seam without blocking the retry path.
What did NOT change (scope boundary): this does not alter how the new session is established, and it does not add broader cache eviction or runtime lifecycle changes outside the session-reset path.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #64169
Related #60656
Related #62026
Related #62731
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: mid-run session rotation persisted the new session metadata and optionally deleted the old transcript, but it never released the previous session id from the bundle MCP runtime cache.
Missing detection / guardrail: the reset-path tests did not assert MCP runtime disposal or cover disposal failures.
Contributing context (if known): the cache has no TTL/LRU and holds stdio-backed MCP client/process references, so each reset could strand another worker pool.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/auto-reply/reply/agent-runner-session-reset.test.ts
Scenario the test should lock in: resetting a reply run session disposes the previous session runtime, still succeeds when disposal throws, and still deletes the old transcript when cleanupTranscripts: true even if disposal fails.
Why this is the smallest reliable guardrail: the behavior is owned by resetReplyRunSession, and the relevant failure handling is exposed through that helper's dependency seam.
Existing test that already covers this (if any): none before this change.
If no new test is added, why not: N/A

User-visible / Behavior Changes

None.

Diagram (if applicable)

Before:
[reset trigger] -> [new session metadata persisted] -> [old MCP runtime remains cached/alive]

After:
[reset trigger] -> [new session metadata persisted] -> [background dispose of old MCP runtime] -> [retry continues]

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: Linux
Runtime/container: local repo checkout
Model/provider: N/A
Integration/channel (if any): MCP stdio runtime path
Relevant config (redacted): N/A

Steps

Trigger resetReplyRunSession from an auto-compaction failure, context overflow, or role-ordering conflict while a bundle MCP runtime is cached for the current session id.
Observe that the helper rotates sessionId, persists the replacement session metadata, and optionally deletes old transcript files.
Verify that the previous session id's MCP runtime is disposed in the background and that disposal failures are logged without preventing the reset from succeeding.

Expected

Old session MCP workers are released when the session resets.
Reset still returns success even if runtime disposal throws.
Transcript cleanup still occurs when requested.

Actual

Matches expected after this change.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios: pnpm tsgo; pnpm test src/auto-reply/reply/agent-runner-session-reset.test.ts; pnpm lint src/auto-reply/reply/agent-runner-session-reset.ts src/auto-reply/reply/agent-runner-session-reset.test.ts; pnpm format on the touched files.
Edge cases checked: disposal success, disposal failure logging, and transcript cleanup when disposal fails.
What you did not verify: long-running production memory behavior outside the targeted reset/test surface.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps:

Risks and Mitigations

Risk: background disposal could fail or hang independently of the retry path.
- Mitigation: disposal is intentionally non-blocking, failures are logged through deps.error(...), and tests cover the failure path plus transcript cleanup continuity.

Tests

pnpm tsgo
pnpm test src/auto-reply/reply/agent-runner-session-reset.test.ts (4 passed)
pnpm lint src/auto-reply/reply/agent-runner-session-reset.ts src/auto-reply/reply/agent-runner-session-reset.test.ts (0 warnings / 0 errors)
pnpm format on touched files

Changed files

src/auto-reply/reply/agent-runner-session-reset.test.ts (modified, +76/-0)
src/auto-reply/reply/agent-runner-session-reset.ts (modified, +12/-0)

Code Example

"mcp": {
  "servers": {}
}

---

"mcp": {
  "servers": {}
}

RAW_BUFFERClick to expand / collapse

Summary

On OpenClaw 2026.4.2, MCP stdio servers appear to accumulate across turns and are not cleaned up properly.

Observed servers:

@modelcontextprotocol/server-sequential-thinking
@upstash/context7-mcp
mcp-deepwiki / mcp-instruct

This caused repeated memory growth on a 15 GiB VPS until restart.

Environment

OpenClaw version: 2026.4.2
Host OS: Linux x64
Session type: Telegram direct chat
Agents affected: not limited to one agent; global mcp.servers exposure made this reproducible across agents

What I observed

Multiple batches of the same MCP-related processes were spawned under openclaw-gateway over time.
They were created repeatedly across conversation turns, even when logs did not show explicit calls to all corresponding tools on each turn.
The processes did not get cleaned up after the turn finished.
Memory usage kept increasing until service restart.

At peak I observed roughly:

about 180 MCP-related processes
about 13.5 GiB RSS combined

Representative process families:

npm exec @modelcontextprotocol/server-sequential-thinking
node /root/.npm/_npx/.../mcp-server-sequential-thinking
npm exec @upstash/context7-mcp --api-key ...
node /root/.npm/_npx/.../context7-mcp --api-key ...
npm exec mcp-deepwiki
node /root/.npm/_npx/.../mcp-instruct

Important behavior

I removed these MCP servers from config by changing:

"mcp": {
  "servers": {}
}

However, after config apply / hot reload, old leaked MCP child processes were still present.

So there seem to be two related issues:

MCP server lifecycle leak / accumulation during normal turns
Config reload does not clean up already spawned MCP child processes that are no longer configured

Evidence

Before mitigation

Representative counts from live inspection:

count=180 total_rss=13471.2 MiB

Later, after restart and more turns, more batches reappeared.

After removing MCP config

Config correctly became:

"mcp": {
  "servers": {}
}

But old processes were still alive, for example:

server-sequential-thinking
context7-mcp
mcp-deepwiki
mcp-instruct

and process count was still non-zero because old instances were not reaped by reload.

Why this looks like an OpenClaw runtime issue

This does not look like a single third-party MCP server bug, because multiple different MCP servers accumulated in the same pattern under openclaw-gateway.

It looks more like OpenClaw's MCP process lifecycle management is:

spawning new stdio servers repeatedly
not reusing or reaping them correctly
and not fully cleaning removed servers on config reload

Expected behavior

MCP stdio servers should either be reused safely or terminated after use
completed turns should not leave stale MCP child processes behind indefinitely
removing mcp.servers from config should stop and reap previously managed MCP child processes

Actual behavior

MCP child processes accumulate over time
memory usage keeps growing
hot reload/config apply does not fully clean old MCP child processes
only stronger restart/cleanup restores memory

Suggested investigation areas

MCP stdio server lifecycle ownership under openclaw-gateway
per-turn tool/mcp bootstrap path spawning behavior
child process cleanup on turn completion
child process cleanup on config reload / server removal
whether MCP tool listing/bootstrap is causing eager respawn each turn

If helpful, I can also provide a more detailed timestamped process timeline from live inspection.

extent analysis

TL;DR

The most likely fix involves modifying OpenClaw's MCP process lifecycle management to properly reuse or terminate stdio servers after use and ensure that completed turns do not leave stale child processes behind.

Guidance

Investigate MCP stdio server lifecycle ownership: Review the code responsible for managing the lifecycle of MCP stdio servers under openclaw-gateway to identify why servers are not being properly cleaned up.
Examine per-turn tool/mcp bootstrap path spawning behavior: Analyze how MCP tools are being spawned on each turn to determine if there's an issue with eager respawn or if the spawning mechanism is not properly reusing existing servers.
Implement child process cleanup on turn completion and config reload: Develop a mechanism to ensure that child processes are terminated after each turn and when their corresponding configuration is removed or reloaded.
Review MCP tool listing/bootstrap to prevent eager respawn: Investigate if the MCP tool listing or bootstrap process is causing the eager respawn of servers on each turn and adjust the logic to prevent unnecessary spawning.

Example

No specific code snippet can be provided without access to the OpenClaw codebase, but an example of how process cleanup might be implemented could involve using a mechanism like process.kill() or child_process.exec() with proper error handling and timeout management to ensure that child processes are terminated after use.

Notes

The provided information suggests that the issue is related to OpenClaw's management of MCP stdio servers, but without direct access to the code, the exact solution will depend on the specifics of the implementation. It's also important to consider potential edge cases, such as handling server crashes or network issues that might affect the cleanup process.

Recommendation

Apply a workaround by implementing a custom script or modifying the existing code to manually clean up MCP child processes after each turn and on config reload, until a permanent fix can be integrated into the OpenClaw codebase. This approach will help mitigate the memory growth issue while a more comprehensive solution is developed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

MCP stdio servers should either be reused safely or terminated after use
completed turns should not leave stale MCP child processes behind indefinitely
removing mcp.servers from config should stop and reap previously managed MCP child processes

#api #authentication issue #prompt issue #agent setup #task chaining

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix MCP stdio servers accumulate across turns and are not cleaned up on config reload (memory leak) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Before mitigation

PR fix notes

PR #64316: fix(agents): release bundle MCP runtime on mid-run session reset

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Tests

Changed files

Code Example

Summary

Environment

What I observed

Important behavior

Evidence

Before mitigation

After removing MCP config

Why this looks like an OpenClaw runtime issue

Expected behavior

Actual behavior

Suggested investigation areas

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING