openclaw - 💡(How to fix) Fix [Bug]: 2026.4.29 gateway instability: Active Memory timeouts, embedded-run prep latency, event-loop pressure [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76212Fetched 2026-05-03 04:40:40
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Timeline (top)
closed ×1commented ×1

After upgrading to OpenClaw 2026.4.29, the gateway became noticeably unstable and slow. The main symptoms are high per-turn latency, Active Memory timeouts, repeated embedded-run startup/prep overhead, event-loop delay warnings, and slow control/session RPCs. Temporarily disabling Active Memory did not resolve the overall instability, so Active Memory appears to be affected by the broader embedded-run/gateway performance regression rather than being the only root cause.

This looks related to the current 2026.4.29 regression cluster, especially #76123, #76166, #76047, #76048, and the superseded Active Memory report #76043.

Error Message

  1. Normal direct-message turns are slow or appear stuck.
  2. Active Memory runs time out during prompt preparation / hidden embedded runs.
  3. Gateway logs show large embedded-run startup/prep costs before model output begins.
  4. Gateway logs show event-loop/liveness warnings under normal use.
  5. sessions.list and node.list calls are repeatedly slow enough to show up in logs and likely contribute to gateway pressure.
  6. Disabling Active Memory alone did not fix the broader instability.
  7. Telegram health reports OK, but there are transient outbound Bot API failures such as sendChatAction network failures.

Root Cause

After upgrading to OpenClaw 2026.4.29, the gateway became noticeably unstable and slow. The main symptoms are high per-turn latency, Active Memory timeouts, repeated embedded-run startup/prep overhead, event-loop delay warnings, and slow control/session RPCs. Temporarily disabling Active Memory did not resolve the overall instability, so Active Memory appears to be affected by the broader embedded-run/gateway performance regression rather than being the only root cause.

Fix Action

Fix / Workaround

[trace:embedded-run] startup stages: phase=attempt-dispatch totalMs=16889 stages=runtime-plugins:9907ms,model-resolution:1763ms,auth:2627ms,attempt-dispatch:2591ms
[trace:embedded-run] prep stages: phase=stream-ready totalMs=20049 stages=core-plugin-tools:9575ms,bundle-tools:877ms,system-prompt:4323ms,session-resource-loader:894ms,stream-setup:4309ms

Code Example

[plugins] active-memory: agent=main session=<direct-session> start timeoutMs=30000 model=zai/glm-5.1
[plugins] [hooks] before_prompt_build handler from active-memory failed: timed out after 60000ms
[diagnostic] lane task error: lane=main durationMs=60719 error="CommandLaneTaskTimeoutError: Command lane \"main\" task timed out after 60000ms"
[plugins] active-memory: agent=main session=<direct-session> done status=timeout elapsedMs=60825 summaryChars=0
[agent/embedded] embedded run failover decision: stage=assistant decision=surface_error reason=timeout from=zai/glm-5.1

---

[trace:embedded-run] startup stages: phase=attempt-dispatch totalMs=16889 stages=runtime-plugins:9907ms,model-resolution:1763ms,auth:2627ms,attempt-dispatch:2591ms
[trace:embedded-run] prep stages: phase=stream-ready totalMs=20049 stages=core-plugin-tools:9575ms,bundle-tools:877ms,system-prompt:4323ms,session-resource-loader:894ms,stream-setup:4309ms

---

[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=9412 eventLoopUtilization=0.995 cpuCoreRatio=1.088

---

[ws] res sessions.list ~2180ms-2210ms
[ws] res node.list ~2180ms-2640ms

---

[telegram] sendChatAction failed: Network request for 'sendChatAction' failed

---

{
  "enabled": true,
  "agents": ["main"],
  "allowedChatTypes": ["direct"],
  "model": "zai/glm-5.1",
  "modelFallback": "zai/glm-5.1",
  "queryMode": "message",
  "promptStyle": "preference-only",
  "timeoutMs": 30000,
  "maxSummaryChars": 120,
  "circuitBreakerMaxTimeouts": 1,
  "circuitBreakerCooldownMs": 600000,
  "recentUserTurns": 0,
  "recentAssistantTurns": 0,
  "recentUserChars": 300,
  "recentAssistantChars": 40,
  "cacheTtlMs": 120000
}
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked better before, now unstable/slow)

Summary

After upgrading to OpenClaw 2026.4.29, the gateway became noticeably unstable and slow. The main symptoms are high per-turn latency, Active Memory timeouts, repeated embedded-run startup/prep overhead, event-loop delay warnings, and slow control/session RPCs. Temporarily disabling Active Memory did not resolve the overall instability, so Active Memory appears to be affected by the broader embedded-run/gateway performance regression rather than being the only root cause.

This looks related to the current 2026.4.29 regression cluster, especially #76123, #76166, #76047, #76048, and the superseded Active Memory report #76043.

Environment

  • OpenClaw: 2026.4.29 (a448042)
  • Install/update channel: stable / npm latest
  • OS: Ubuntu Linux x64
  • Node: v22.22.2
  • Gateway: systemd service, local loopback gateway
  • Primary model route: openai-codex/gpt-5.5
  • Active Memory model route: zai/glm-5.1
  • Telegram direct channel is enabled and generally reachable

Observed behavior

  1. Normal direct-message turns are slow or appear stuck.
  2. Active Memory runs time out during prompt preparation / hidden embedded runs.
  3. Gateway logs show large embedded-run startup/prep costs before model output begins.
  4. Gateway logs show event-loop/liveness warnings under normal use.
  5. sessions.list and node.list calls are repeatedly slow enough to show up in logs and likely contribute to gateway pressure.
  6. Disabling Active Memory alone did not fix the broader instability.
  7. Telegram health reports OK, but there are transient outbound Bot API failures such as sendChatAction network failures.

Local evidence / log excerpts

Representative gateway log excerpts, sanitized:

[plugins] active-memory: agent=main session=<direct-session> start timeoutMs=30000 model=zai/glm-5.1
[plugins] [hooks] before_prompt_build handler from active-memory failed: timed out after 60000ms
[diagnostic] lane task error: lane=main durationMs=60719 error="CommandLaneTaskTimeoutError: Command lane \"main\" task timed out after 60000ms"
[plugins] active-memory: agent=main session=<direct-session> done status=timeout elapsedMs=60825 summaryChars=0
[agent/embedded] embedded run failover decision: stage=assistant decision=surface_error reason=timeout from=zai/glm-5.1

Embedded-run startup/prep examples:

[trace:embedded-run] startup stages: phase=attempt-dispatch totalMs=16889 stages=runtime-plugins:9907ms,model-resolution:1763ms,auth:2627ms,attempt-dispatch:2591ms
[trace:embedded-run] prep stages: phase=stream-ready totalMs=20049 stages=core-plugin-tools:9575ms,bundle-tools:877ms,system-prompt:4323ms,session-resource-loader:894ms,stream-setup:4309ms

Event-loop/liveness example:

[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=9412 eventLoopUtilization=0.995 cpuCoreRatio=1.088

Slow RPC examples:

[ws] res sessions.list ~2180ms-2210ms
[ws] res node.list ~2180ms-2640ms

Telegram transient outbound failure example:

[telegram] sendChatAction failed: Network request for 'sendChatAction' failed

Active Memory config details

Active Memory is enabled only for the main agent and direct chats. The local config is already reduced/conservative:

{
  "enabled": true,
  "agents": ["main"],
  "allowedChatTypes": ["direct"],
  "model": "zai/glm-5.1",
  "modelFallback": "zai/glm-5.1",
  "queryMode": "message",
  "promptStyle": "preference-only",
  "timeoutMs": 30000,
  "maxSummaryChars": 120,
  "circuitBreakerMaxTimeouts": 1,
  "circuitBreakerCooldownMs": 600000,
  "recentUserTurns": 0,
  "recentAssistantTurns": 0,
  "recentUserChars": 300,
  "recentAssistantChars": 40,
  "cacheTtlMs": 120000
}

Expected behavior

  • Simple direct-message turns should not spend tens of seconds in embedded-run prep before reaching the model.
  • Active Memory should either return bounded recall quickly or degrade gracefully without blocking the user-visible turn for ~60s.
  • sessions.list / node.list should not materially contribute to gateway event-loop pressure during normal use.
  • Transient Telegram typing/send failures should not leave the system feeling wedged.

Related issues / possible overlap

  • #76123 — 2026.4.29 performance regression: latency, stuck sessions, event-loop blocking
  • #76166 — Control UI repeatedly calls slow sessions.list
  • #76047 — event-loop saturation involving temp-file pressure / node.list
  • #76048 — ZAI GLM-5 reasoning output routed to hidden thinking instead of visible text
  • #76043 — Active Memory embedded-run startup overhead, superseded by canonical embedded-run prep tracker
  • #76174 / #76176 — embedded-run/provider/Telegram hang symptom family

Notes

This report is intentionally sanitized and omits hostnames, domains, usernames, chat identifiers, local filesystem paths, and tokens.

extent analysis

TL;DR

Downgrade OpenClaw to a version prior to 2026.4.29 or wait for a patch release that addresses the performance regression and instability issues.

Guidance

  • Review the related issues (#76123, #76166, #76047, #76048, #76043) to understand the scope of the performance regression and potential workarounds.
  • Consider temporarily disabling Active Memory or adjusting its configuration to reduce the load on the gateway.
  • Monitor the gateway logs for event-loop warnings and slow RPC calls to identify potential bottlenecks.
  • Verify that the Telegram direct channel is properly configured and reachable to rule out external factors contributing to the instability.

Example

No code snippet is provided as the issue is related to a specific version of OpenClaw and its configuration.

Notes

The issue is likely related to the 2026.4.29 regression cluster, and downgrading or waiting for a patch release may be the most effective solution. However, adjusting the Active Memory configuration or disabling it temporarily may help mitigate the instability.

Recommendation

Apply workaround: Downgrade OpenClaw to a version prior to 2026.4.29 to avoid the performance regression and instability issues until a patch release is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Simple direct-message turns should not spend tens of seconds in embedded-run prep before reaching the model.
  • Active Memory should either return bounded recall quickly or degrade gracefully without blocking the user-visible turn for ~60s.
  • sessions.list / node.list should not materially contribute to gateway event-loop pressure during normal use.
  • Transient Telegram typing/send failures should not leave the system feeling wedged.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: 2026.4.29 gateway instability: Active Memory timeouts, embedded-run prep latency, event-loop pressure [1 comments, 2 participants]