openclaw - 💡(How to fix) Fix Gateway memory leak: RPC becomes unresponsive after ~24 hours [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74043Fetched 2026-04-30 06:29:27
View on GitHub
Comments
1
Participants
2
Timeline
11
Reactions
0
Timeline (top)
mentioned ×4subscribed ×4closed ×1commented ×1

Fix Action

Workaround

Currently using a custom heal script that checks both process liveness AND RPC responsiveness. If alive but unresponsive, kills with kill -9 and waits for auto-restart.

Code Example

2026-04-28 10:58:01 Gateway down, starting...
2026-04-28 10:58:10 Gateway started OK
2026-04-29 11:16:09 [HEAL] Gateway zombie: process alive but RPC unresponsive (likely memory leak)
2026-04-29 11:16:09 [HEAL] Killing zombie PID: 417212
2026-04-29 11:16:19 [HEAL] Gateway started OK
RAW_BUFFERClick to expand / collapse

Bug Description

Gateway process remains alive but RPC becomes completely unresponsive after running for approximately 24-25 hours. The process must be killed with kill -9 and restarted by a heal script.

Environment

  • OS: WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2, x64)
  • OpenClaw Version: v2026.4.12
  • Node.js: v22.22.2
  • Plugins: QQBot (@tencent-connect/openclaw-qqbot v1.7.0), Feishu, multiple cron jobs
  • Memory: 32GB RAM, RTX 2060 6GB VRAM

Reproduction Steps

  1. Start Gateway with openclaw gateway run
  2. Let it run continuously with normal QQ/Feishu channel traffic and periodic cron jobs
  3. After approximately 24-25 hours, Gateway process is still running but RPC calls time out
  4. Memory usage grows from ~700MB (after restart) to ~3.4GB (before becoming unresponsive)

Observed Behavior

  • Process remains alive (pgrep returns PID)
  • RPC health check fails (times out after 8 seconds)
  • Memory usage steadily increases over time (~24 hours to reach 3.4GB)
  • No crash logs — process simply stops responding to RPC

What Actually Happened

2026-04-28 10:58:01 Gateway down, starting...
2026-04-28 10:58:10 Gateway started OK
2026-04-29 11:16:09 [HEAL] Gateway zombie: process alive but RPC unresponsive (likely memory leak)
2026-04-29 11:16:09 [HEAL] Killing zombie PID: 417212
2026-04-29 11:16:19 [HEAL] Gateway started OK

After ~24.5 hours, the Gateway process (PID 417212) was detected as a zombie — alive but RPC unresponsive. The heal script killed it and restarted successfully.

Workaround

Currently using a custom heal script that checks both process liveness AND RPC responsiveness. If alive but unresponsive, kills with kill -9 and waits for auto-restart.

Impact

  • Without external monitoring, Gateway becomes a zombie (offline but process running)
  • Requires daily forced restarts to maintain stability
  • All channel connections (QQ, Feishu) drop silently

Additional Context

  • This is a slow memory leak, not a sudden crash
  • v2026.4.26 release notes do not mention a fix for this specific issue
  • The issue reproduces consistently: every ~24 hours of continuous operation

extent analysis

TL;DR

The Gateway process likely suffers from a memory leak, causing it to become unresponsive to RPC calls after approximately 24-25 hours, requiring a forced restart.

Guidance

  • Investigate the memory usage pattern of the Gateway process to identify potential memory leak sources, focusing on the plugins (e.g., QQBot, Feishu) and cron jobs.
  • Review the code for any unclosed resources, such as database connections, file handles, or network sockets, that could contribute to the memory leak.
  • Consider adding more detailed logging to track memory allocation and deallocation patterns, helping to pinpoint the leak's origin.
  • Evaluate the feasibility of implementing a periodic restart mechanism, similar to the existing heal script, to mitigate the issue until a permanent fix is found.

Example

No specific code snippet can be provided without more context, but a potential approach to tracking memory usage could involve using Node.js's built-in process.memoryUsage() function to monitor heap statistics.

Notes

The provided information suggests a consistent, reproducible issue, but without access to the Gateway process's codebase or more detailed logs, it's challenging to provide a definitive solution. The presence of a custom heal script indicates that the issue is currently being mitigated, but a more permanent fix would be desirable.

Recommendation

Apply workaround: Continue using the custom heal script to monitor the Gateway process and restart it when necessary, while concurrently investigating the root cause of the memory leak to implement a permanent fix.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway memory leak: RPC becomes unresponsive after ~24 hours [1 comments, 2 participants]