openclaw - 💡(How to fix) Fix Gateway memory leak: RPC becomes unresponsive after ~24 hours [1 comments, 2 participants]

xiaotong2026 · 2026-04-29T03:39:11Z

[openclaw] Bug Description Gateway process remains alive but RPC becomes completely unresponsive after running for approximately 24-25 hours. The process must… ## Workaround Currently using a custom heal script that checks both process liveness AND RPC responsiveness. If alive but unresponsive, kills with `kill -9` and waits for auto-restart. ## Bug Description Gateway process remains alive but RPC becomes completely unresponsive after running for approximately 24-25 hours. The process must be killed with `kill -9` and restarted by a heal script. ## Environment - **OS**: WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2, x64) - **OpenClaw Version**: v2026.4.12 - **Node.js**: v22.22.2 - **Plugins**: QQBot (@tencent-connect/openclaw-qqbot v1.7.0), Feishu, multiple cron jobs - **Memory**: 32GB RAM, RTX 2060 6GB VRAM ## Reproduction Steps 1. Start Gateway with `openclaw gateway run` 2. Let it run continuously with normal QQ/Feishu channel traffic and periodic cron jobs 3. After approximately 24-25 hours, Gateway process is still running but RPC calls time out 4. Memory usage grows from ~700MB (after restart) to ~3.4GB (before becoming unresponsive) ## Observed Behavior - Process remains alive (`pgrep` returns PID) - RPC health check fails (times out after 8 seconds) - Memory usage steadily increases over time (~24 hours to reach 3.4GB) - No crash logs — process simply stops responding to RPC ## What Actually Happened ``` 2026-04-28 10:58:01 Gateway down, starting... 2026-04-28 10:58:10 Gateway started OK 2026-04-29 11:16:09 [HEAL] Gateway zombie: process alive but RPC unresponsive (likely memory leak) 2026-04-29 11:16:09 [HEAL] Killing zombie PID: 417212 2026-04-29 11:16:19 [HEAL] Gateway started OK ``` After ~24.5 hours, the Gateway process (PID 417212) was detected as a zombie — alive but RPC unresponsive. The heal script killed it and restarted successfully. ## Workaround Currently using a custom heal script that checks both process liveness AND RPC responsiveness. If alive but unresponsive, kills with `kill -9` and waits for auto-restart. ## Impact - Without external monitoring, Gateway becomes a zombie (offline but process running) - Requires daily forced restarts to maintain stability - All channel connections (QQ, Feishu) drop silently ## Additional Context - This is a slow memory leak, not a sudden crash - v2026.4.26 release notes do not mention a fix for this specific issue - The issue reproduces consistently: every ~24 hours of continuous operation

openclaw2026-04-29 03:39:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74043•Fetched 2026-04-30 06:29:27

View on GitHub

Comments

Participants

Timeline

Reactions

Author

xiaotong2026

Participants

clawsweeper[bot]

xiaotong2026

Timeline (top)

mentioned ×4subscribed ×4closed ×1commented ×1

Fix Action

Workaround

Currently using a custom heal script that checks both process liveness AND RPC responsiveness. If alive but unresponsive, kills with kill -9 and waits for auto-restart.

Code Example

2026-04-28 10:58:01 Gateway down, starting...
2026-04-28 10:58:10 Gateway started OK
2026-04-29 11:16:09 [HEAL] Gateway zombie: process alive but RPC unresponsive (likely memory leak)
2026-04-29 11:16:09 [HEAL] Killing zombie PID: 417212
2026-04-29 11:16:19 [HEAL] Gateway started OK

RAW_BUFFERClick to expand / collapse

Bug Description

Gateway process remains alive but RPC becomes completely unresponsive after running for approximately 24-25 hours. The process must be killed with kill -9 and restarted by a heal script.

Environment

OS: WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2, x64)
OpenClaw Version: v2026.4.12
Node.js: v22.22.2
Plugins: QQBot (@tencent-connect/openclaw-qqbot v1.7.0), Feishu, multiple cron jobs
Memory: 32GB RAM, RTX 2060 6GB VRAM

Reproduction Steps

Start Gateway with openclaw gateway run
Let it run continuously with normal QQ/Feishu channel traffic and periodic cron jobs
After approximately 24-25 hours, Gateway process is still running but RPC calls time out
Memory usage grows from ~700MB (after restart) to ~3.4GB (before becoming unresponsive)

Observed Behavior

Process remains alive (pgrep returns PID)
RPC health check fails (times out after 8 seconds)
Memory usage steadily increases over time (~24 hours to reach 3.4GB)
No crash logs — process simply stops responding to RPC

What Actually Happened

2026-04-28 10:58:01 Gateway down, starting...
2026-04-28 10:58:10 Gateway started OK
2026-04-29 11:16:09 [HEAL] Gateway zombie: process alive but RPC unresponsive (likely memory leak)
2026-04-29 11:16:09 [HEAL] Killing zombie PID: 417212
2026-04-29 11:16:19 [HEAL] Gateway started OK

After ~24.5 hours, the Gateway process (PID 417212) was detected as a zombie — alive but RPC unresponsive. The heal script killed it and restarted successfully.

Workaround

Currently using a custom heal script that checks both process liveness AND RPC responsiveness. If alive but unresponsive, kills with kill -9 and waits for auto-restart.

Impact

Without external monitoring, Gateway becomes a zombie (offline but process running)
Requires daily forced restarts to maintain stability
All channel connections (QQ, Feishu) drop silently

Additional Context

This is a slow memory leak, not a sudden crash
v2026.4.26 release notes do not mention a fix for this specific issue
The issue reproduces consistently: every ~24 hours of continuous operation

extent analysis

TL;DR

The Gateway process likely suffers from a memory leak, causing it to become unresponsive to RPC calls after approximately 24-25 hours, requiring a forced restart.

Guidance

Investigate the memory usage pattern of the Gateway process to identify potential memory leak sources, focusing on the plugins (e.g., QQBot, Feishu) and cron jobs.
Review the code for any unclosed resources, such as database connections, file handles, or network sockets, that could contribute to the memory leak.
Consider adding more detailed logging to track memory allocation and deallocation patterns, helping to pinpoint the leak's origin.
Evaluate the feasibility of implementing a periodic restart mechanism, similar to the existing heal script, to mitigate the issue until a permanent fix is found.

Example

No specific code snippet can be provided without more context, but a potential approach to tracking memory usage could involve using Node.js's built-in process.memoryUsage() function to monitor heap statistics.

Notes

The provided information suggests a consistent, reproducible issue, but without access to the Gateway process's codebase or more detailed logs, it's challenging to provide a definitive solution. The presence of a custom heal script indicates that the issue is currently being mitigated, but a more permanent fix would be desirable.

Recommendation

Apply workaround: Continue using the custom heal script to monitor the Gateway process and restart it when necessary, while concurrently investigating the root cause of the memory leak to implement a permanent fix.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#mixed precision #training loop #device allocation #model download #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Gateway memory leak: RPC becomes unresponsive after ~24 hours [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

Code Example

Bug Description

Environment

Reproduction Steps

Observed Behavior

What Actually Happened

Workaround

Impact

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Gateway memory leak: RPC becomes unresponsive after ~24 hours [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

Code Example

Bug Description

Environment

Reproduction Steps

Observed Behavior

What Actually Happened

Workaround

Impact

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING