openclaw - ✅(Solved) Fix active-memory embedded sub-agent WebSocket handshake timeout (all models affected) [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74292Fetched 2026-04-30 06:26:03
View on GitHub
Comments
3
Participants
3
Timeline
11
Reactions
2
Timeline (top)
cross-referenced ×5commented ×3closed ×1mentioned ×1

Error Message

Workaround: Disabling active-memory eliminates the error. Main agent functionality works normally without it.

Root Cause

The active-memory plugin embedded sub-agent consistently times out (~17-45s) regardless of which model is used. The root cause appears to be a WebSocket handshake failure when the embedded sub-agent tries to connect back to the Gateway to execute memory_search / memory_get tools.

Fix Action

Fix / Workaround

Workaround: Disabling active-memory eliminates the error. Main agent functionality works normally without it.

PR fix notes

PR #74480: fix(active-memory): preserve setup grace for embedded recall

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: Active Memory could time out at the raw configured recall timeout because the embedded recall runner did not receive the setup-grace timeout budget.
  • Why it matters: This made provider-agnostic Active Memory runs fail around 15s with empty output even though the plugin wrapper already intended to allow setup grace.
  • What changed: Active Memory now uses one effective embedded recall timeout equal to timeoutMs + setupGraceTimeoutMs for the embedded runner, outer watchdog, and hook timeout.
  • What did NOT change (scope boundary): This does not change memory search behavior, provider routing, circuit breaker semantics, or the configured timeoutMs clamp itself.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #73306
  • Related #74292
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Active Memory computed a setup-grace-aware timeout for the outer watchdog and hook, but passed only config.timeoutMs into runEmbeddedPiAgent, allowing the embedded runner to self-timeout before the grace window could help.
  • Missing detection / guardrail: The setup-grace regression test did not assert that the embedded runner itself received the grace-extended timeout.
  • Contributing context (if known): Recent fixes added memory_recall, circuit breaker behavior, and setup-grace handling, but the timeout budget was not applied consistently across the embedded runner boundary.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/active-memory/index.test.ts
  • Scenario the test should lock in: Active Memory setup delay longer than raw timeoutMs but within setup grace should still succeed, and runEmbeddedPiAgent should receive timeoutMs + setupGraceTimeoutMs.
  • Why this is the smallest reliable guardrail: The bug is at the plugin-to-embedded-runner timeout handoff, so a focused plugin unit test catches it without requiring a live provider.
  • Existing test that already covers this (if any): Existing setup-grace test covered the outer behavior but not the embedded runner timeout argument.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Active Memory recall gets the intended setup-grace budget before timing out, reducing false 15s timeout failures during embedded runner startup/setup.

Diagram (if applicable)

Before:
Active Memory timeoutMs=15000 -> embedded runner timeoutMs=15000 -> setup can consume budget -> timeout

After:
Active Memory timeoutMs=15000 + setup grace -> embedded runner gets effective timeout -> recall can complete within grace window

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Node 22.14.0, pnpm
  • Model/provider: Not required for unit verification
  • Integration/channel (if any): Active Memory plugin
  • Relevant config (redacted): active-memory.timeoutMs with setup grace enabled

Steps

  1. Configure Active Memory with a small timeoutMs.
  2. Simulate embedded recall setup taking longer than raw timeoutMs but less than timeoutMs + setupGraceTimeoutMs.
  3. Run the Active Memory plugin tests.

Expected

  • Embedded recall receives the setup-grace-extended timeout.
  • Recall can succeed when setup delay fits inside the grace window.

Actual

  • Before this PR, the embedded runner received only raw timeoutMs.
  • After this PR, the embedded runner receives timeoutMs + setupGraceTimeoutMs.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: Active Memory embedded recall may run longer than operators expect if they interpreted timeoutMs as a hard total wall-clock limit.
  • Mitigation: This matches the existing setup-grace design already used by the outer watchdog and hook timeout; the configured timeout clamp and circuit breaker remain in place.

Verification commands to mention if needed:

```bash
pnpm test:fast extensions/active-memory/index.test.ts
pnpm test:fast extensions/active-memory/config.test.ts
pnpm exec oxlint extensions/active-memory/index.ts extensions/active-memory/index.test.ts
git diff --check

## Changed files

- `extensions/active-memory/index.test.ts` (modified, +13/-7)
- `extensions/active-memory/index.ts` (modified, +9/-3)

Code Example

active-memory: agent=duoduo session=agent:duoduo:feishu:direct:ou_... activeProvider=deepseek activeModel=deepseek-v4-flash done status=timeout elapsedMs=27059 summaryChars=0

handshake timeout + closed before connect (repeated)
conn=... peer=127.0.0.1:...->127.0.0.1:18789 ... code=1006 reason=n/a

---

{
  "active-memory": {
    "enabled": true,
    "config": {
      "agents": ["duoduo", "miumiu", "jelly", "sugarbaby", "der", "zhumi"],
      "model": "deepseek/deepseek-v4-flash",
      "modelFallback": "glm/GLM-5-Turbo",
      "queryMode": "recent",
      "promptStyle": "balanced",
      "allowedChatTypes": ["direct"],
      "timeoutMs": 15000,
      "maxSummaryChars": 220,
      "logging": true
    }
  }
}
RAW_BUFFERClick to expand / collapse

Environment:

  • OpenClaw: 2026.4.26 (be8c246)
  • OS: macOS 15.2 (arm64, Mac mini M2)
  • Node: v24.14.0
  • Config: 6 agents, active-memory enabled for all, models tested: GLM-5.1 / GLM-5-Turbo / DeepSeek V4 Flash

Description:

The active-memory plugin embedded sub-agent consistently times out (~17-45s) regardless of which model is used. The root cause appears to be a WebSocket handshake failure when the embedded sub-agent tries to connect back to the Gateway to execute memory_search / memory_get tools.

Evidence from gateway logs:

active-memory: agent=duoduo session=agent:duoduo:feishu:direct:ou_... activeProvider=deepseek activeModel=deepseek-v4-flash done status=timeout elapsedMs=27059 summaryChars=0

handshake timeout + closed before connect (repeated)
conn=... peer=127.0.0.1:...->127.0.0.1:18789 ... code=1006 reason=n/a

Key observations:

  1. Direct API calls to the same models respond in <1s (tested with curl, DeepSeek V4 Flash: 0.6s)
  2. All 3 tested models (GLM-5.1, GLM-5-Turbo, DeepSeek V4 Flash) produce the same timeout
  3. summaryChars=0 on every attempt — the sub-agent never completes its tool calls
  4. Embedding index is healthy: 110 files, 940 chunks, openclaw memory status --deep shows no errors
  5. The WS closed before connect / code=1006 entries appear immediately after each active-memory timeout

Config:

{
  "active-memory": {
    "enabled": true,
    "config": {
      "agents": ["duoduo", "miumiu", "jelly", "sugarbaby", "der", "zhumi"],
      "model": "deepseek/deepseek-v4-flash",
      "modelFallback": "glm/GLM-5-Turbo",
      "queryMode": "recent",
      "promptStyle": "balanced",
      "allowedChatTypes": ["direct"],
      "timeoutMs": 15000,
      "maxSummaryChars": 220,
      "logging": true
    }
  }
}

Expected behavior: Active memory sub-agent connects to Gateway via WebSocket, executes memory_search / memory_get, and returns a summary within the timeout window.

Actual behavior: WebSocket handshake fails consistently, causing the sub-agent to timeout without producing any summary.

Workaround: Disabling active-memory eliminates the error. Main agent functionality works normally without it.

Additional context:

  • Gateway bind: 127.0.0.1:18789
  • LaunchAgent: gui/501/ai.openclaw.gateway
  • Embedding provider: GLM embedding-3 (remote, working)
  • Memory index: SQLite backend, all files indexed

extent analysis

TL;DR

The WebSocket handshake failure between the active-memory sub-agent and the Gateway is likely causing the timeouts, and adjusting the timeoutMs configuration or checking the Gateway's WebSocket connection settings may help.

Guidance

  • Verify the Gateway's WebSocket connection settings to ensure it is properly configured to handle connections from the active-memory sub-agent.
  • Consider increasing the timeoutMs value in the active-memory configuration to allow more time for the WebSocket handshake to complete.
  • Check the system's firewall or network settings to ensure that the connection between the sub-agent and the Gateway is not being blocked.
  • Test the WebSocket connection independently using a tool like wscat to isolate the issue.

Example

No code snippet is provided as the issue seems to be related to configuration and network settings rather than code.

Notes

The issue may be related to the specific network environment or the Gateway's configuration, so further investigation and testing may be needed to determine the root cause.

Recommendation

Apply workaround: Adjust the timeoutMs value or check the Gateway's WebSocket connection settings, as this may help resolve the issue without requiring a full fix.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix active-memory embedded sub-agent WebSocket handshake timeout (all models affected) [1 pull requests, 3 comments, 3 participants]