openclaw - ✅(Solved) Fix Post-upgrade stability regressions — v2026.4.5 (3e72c03) [1 pull requests, 1 comments, 1 participants]

openclaw2026-04-06 19:36:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62095•Fetched 2026-04-08 03:09:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jmnickels

Participants

jmnickels

Timeline (top)

cross-referenced ×2commented ×1subscribed ×1

Error Message

openclaw node run --host 192.168.x.x --port 18789 now fails with SECURITY ERROR: Cannot connect over plaintext ws://. This is a new check that broke existing setups where gateway binds to loopback but node was configured with the LAN IP. The node crash-looped until the plist was manually edited to use 127.0.0.1. Could use a clearer migration note or auto-detection when both processes are on the same machine. Gateway reached 1.5GB RAM and 47% CPU within a few hours of running. Contributing factors: 379 accumulated session files (187MB), 167MB error log, zombie WebSocket connections. May be a connection/session cleanup regression — sessions and connections don't seem to be released properly.

Fix Action

Fix / Workaround

2. Subagent announce timeout defaults to 120s — causes gateway hangs When the main session is busy processing a turn, subagent completion announcements block for 120s per attempt, then retry 4x. That's ~8 minutes of gateway pressure per failed announce. Had 119 announce timeouts in one day. This causes iMessage to stop responding and webchat to disconnect. Workaround: set agents.defaults.subagents.announceTimeoutMs: 15000. Suggest a much lower default (15-30s) — if the main session is busy, fail fast.

4. Slack health-monitor stale-socket reconnection loop Both Slack accounts reconnected every ~35 minutes due to health-monitor: restarting (reason: stale-socket). This ran continuously all day, adding gateway churn. Each reconnect also triggered channel resolve failed: missing_scope errors. Workaround: disabled Slack entirely.

PR fix notes

PR #8: fix(evals): pin openclaw 2026.4.2, restore log-based detection, add group/cross-convo evals

Repository: chughtapan/moltzap
Author: chughtapan
State: closed | merged: True
Link: https://github.com/chughtapan/moltzap/pull/8

Description (problem / solution / changelog)

Summary

Fix eval infrastructure hanging caused by two issues:

OpenClaw v2026.4.5 has a confirmed memory leak and 12+ open regressions (Issue #62095). Pinned to v2026.4.2, the last stable release.
Container readiness detection was rewritten to poll file logs and send probe DMs requiring LLM inference. Reverted to the original stream-based docker logs -f approach, which works correctly.

New eval capabilities:

Group conversations with bystander agents (EVAL-006 updated, EVAL-010, EVAL-011 new)
Cross-conversation information leak probes (EVAL-008 updated with real cross-convo probe)
Deterministic pass/fail checks (skip LLM judge for obvious results)
Full transcript tracking for multi-turn and cross-conversation scenarios
SessionKey fix: uses actual chat type (group/dm) instead of hardcoding dm

Code quality:

Client cleanup moved to finally block (prevents resource leaks on early exit)
Bystander registration parallelized via Promise.all
Removed unused crossConversationExpected type field
Fixed "connected" pattern to "connected as" to avoid false-matching on "disconnected"

Test plan

Build passes (all 6 packages)
193 unit tests pass across 5 packages (protocol, cli, server-core, openclaw-channel)
Lint: 0 errors, 0 warnings
E2E evals verified: 7/11 pass (4 failures are agent behavior, not infra)

🤖 Generated with Claude Code

Changed files

packages/evals/Dockerfile.eval-agent (modified, +1/-1)
packages/evals/scripts/build-eval-agent.sh (modified, +1/-1)
packages/evals/src/e2e-infra/llm-judge.ts (modified, +53/-6)
packages/evals/src/e2e-infra/model-config.ts (modified, +6/-0)
packages/evals/src/e2e-infra/runner.ts (modified, +161/-10)
packages/evals/src/e2e-infra/scenarios.ts (modified, +63/-11)
packages/evals/src/e2e-infra/types.ts (modified, +20/-0)
packages/openclaw-channel/src/openclaw-entry.inbound-contract.test.ts (modified, +1/-0)
packages/openclaw-channel/src/openclaw-entry.ts (modified, +1/-1)
packages/openclaw-channel/src/test-utils/container-core.ts (modified, +2/-2)

RAW_BUFFERClick to expand / collapse

Post-upgrade stability issues — v2026.4.5 (3e72c03)

Upgraded to 2026.4.5 this morning. System was stable before the upgrade. Experienced 10 gateway restarts in ~8 hours due to several issues. Environment: Mac Studio M3 Ultra, BlueBubbles iMessage, local loopback gateway.

1. doctor --fix doesn't fix its own warnings doctor --fix reports legacy config keys (channels.slack.channels.<id>.allow, messages.tts.<provider>, plugins.entries.voice-call.config.tts.<provider>) and tells you to run doctor --fix — but running it doesn't actually fix them. Had to manually edit openclaw.json to rename allow → enabled and nest TTS provider configs under providers. The --fix flag should handle these migrations automatically.

3. Node refuses plaintext WS to private LAN IPs (new security check) openclaw node run --host 192.168.x.x --port 18789 now fails with SECURITY ERROR: Cannot connect over plaintext ws://. This is a new check that broke existing setups where gateway binds to loopback but node was configured with the LAN IP. The node crash-looped until the plist was manually edited to use 127.0.0.1. Could use a clearer migration note or auto-detection when both processes are on the same machine.

5. Gateway memory growth — 1.5GB within hours Gateway reached 1.5GB RAM and 47% CPU within a few hours of running. Contributing factors: 379 accumulated session files (187MB), 167MB error log, zombie WebSocket connections. May be a connection/session cleanup regression — sessions and connections don't seem to be released properly.

extent analysis

TL;DR

Downgrade to a previous version or apply workarounds for specific issues, such as adjusting the subagent announce timeout and editing configuration files to fix legacy config keys.

Guidance

Manually edit openclaw.json to rename legacy config keys, such as allow to enabled and nest TTS provider configs under providers, as the doctor --fix flag does not handle these migrations automatically.
Set agents.defaults.subagents.announceTimeoutMs to a lower value, such as 15-30s, to prevent gateway hangs caused by subagent announce timeouts.
Update the node configuration to use 127.0.0.1 instead of the LAN IP to avoid the new security check that refuses plaintext WS connections to private LAN IPs.
Disable Slack or investigate the health-monitor stale-socket reconnection loop to reduce gateway churn.
Monitor and clean up accumulated session files and error logs to prevent gateway memory growth.

Example

No code snippet is provided as the issue does not require a specific code change, but rather configuration edits and workarounds.

Notes

The provided workarounds may not fix all issues, and a more thorough investigation may be required to address the underlying causes. Additionally, downgrading to a previous version may not be feasible or desirable in all cases.

Recommendation

Apply workarounds, as downgrading to a previous version may not be a viable long-term solution, and the workarounds provided can help mitigate the specific issues mentioned in the problem report.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#dependency error #configuration error #environment variable #network issue #logging issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix Post-upgrade stability regressions — v2026.4.5 (3e72c03) [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

PR fix notes

PR #8: fix(evals): pin openclaw 2026.4.2, restore log-based detection, add group/cross-convo evals

Description (problem / solution / changelog)

Summary

Test plan

Changed files

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix Post-upgrade stability regressions — v2026.4.5 (3e72c03) [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

PR fix notes

PR #8: fix(evals): pin openclaw 2026.4.2, restore log-based detection, add group/cross-convo evals

Description (problem / solution / changelog)

Summary

Test plan

Changed files

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING