Code Example

08:00:23 start → 08:01:17 timeout (54s elapsed, 0 chars returned)
08:07:53 start → 08:08:40 timeout (47s elapsed, 0 chars returned)
08:16:43 start → 08:17:31 timeout (48s elapsed, 0 chars returned)

---

startup: 15873ms (runtime-plugins 3160ms, model-resolution 3076ms, auth 4740ms, attempt-dispatch 4895ms)
prep: 33758ms (workspace-sandbox 16ms, core-plugin-tools 10129ms, system-prompt 7933ms, stream-setup 7389ms)
TOTAL: 49631ms — exceeds 45s lane timeout, LLM never runs

---

startup: 12-16s
prep: 29-37s
Same 44-51s total on every embedded run

---

Also 41-53s startup+prep, but no hard 45s timeout, so they complete
(but still show terrible latency: ~3.5 minutes from inbound message to reply)

OpenClaw Active-Memory Bug Report: Embedded Run Startup Overhead

Summary

Active-memory plugin has 0% success rate due to consistent 45-second timeouts. Investigation reveals the bottleneck is NOT the model or memory search, but OpenClaw's embedded run framework's startup overhead.

Impact

Active-memory has never returned context in production (multiple gateways, consistent across sessions)
45s+ latency added to every message while it times out
Circuit breaker missing, so failures cascade

Root Cause

Every embedded run (main session + sub-agents) incurs ~44-51 seconds of initialization before the LLM is even called:

Startup Phase (12-16s)

model-resolution: 3.0-3.5s (local config parsing)
auth: 4.5-5.1s (credential resolution from env/profile)
attempt-dispatch: 4.4-4.9s (internal routing)

Prep Phase (29-37s)

core-plugin-tools: 9.0-10.4s (loading all 9 plugin tools)
system-prompt: 7.5-11.3s (building prompt from bootstrap files)
session-resource-loader: 3.8-5.6s (loading session state)
bundle-tools: 1.6-1.8s (bundling tool definitions)
stream-setup: 7.3-8.6s (initializing API stream)

Total: 41-53 seconds before any LLM inference happens.

For comparison: actual network round-trip to Anthropic API = 150ms (TLS 43ms + TTFB 145ms).

Evidence

Test Environment

OpenClaw 2026.4.29
Anthropic claude-opus-4-6 (main model)
9 plugins active (active-memory, discord, slack, telegram, memory-core, memory-wiki, google-meet, diagnostics-prometheus, acpx)
73 skill directories

Timings (journalctl PID 410217, May 2 08:00-08:18 UTC)

Active-memory runs (all timeout at 45s lane deadline):

08:00:23 start → 08:01:17 timeout (54s elapsed, 0 chars returned)
08:07:53 start → 08:08:40 timeout (47s elapsed, 0 chars returned)
08:16:43 start → 08:17:31 timeout (48s elapsed, 0 chars returned)

Trace breakdown (typical run):

startup: 15873ms (runtime-plugins 3160ms, model-resolution 3076ms, auth 4740ms, attempt-dispatch 4895ms)
prep: 33758ms (workspace-sandbox 16ms, core-plugin-tools 10129ms, system-prompt 7933ms, stream-setup 7389ms)
TOTAL: 49631ms — exceeds 45s lane timeout, LLM never runs

Comparison: Old Gateway (PID 397720)

Identical timings:

startup: 12-16s
prep: 29-37s
Same 44-51s total on every embedded run

Non-active-memory embedded runs (main Telegram session):

Also 41-53s startup+prep, but no hard 45s timeout, so they complete
(but still show terrible latency: ~3.5 minutes from inbound message to reply)

Configuration Notes

Active-memory config as deployed:

timeoutMs: 15000 (15s) — but lane timeout is 45s, plugin timeout is ignored
queryMode: "recent" — reasonable for business context
model unset → inherits claude-opus-4-6 (expensive + no latency benefit given boot overhead)
allowedChatTypes: ["direct"] — only direct messages, never used in practice

What Config Changes CAN'T Fix

Switching to haiku won't help if 44s is burned before the API call
Raising timeoutMs to 60s just adds 50-60s latency per message (worse than disabled)
Disabling plugins would break other functionality
Caching bootstrap files doesn't address model-resolution, auth, or attempt-dispatch overhead

Hypotheses for Investigation

Auth stage (4.5s): Should be env var lookup (~1ms). Is it making an API call to validate credentials? Or loading a large auth context?
Model-resolution (3s): Should be local config lookup (~1ms). Is it querying provider endpoints or doing some other initialization?
Core-plugin-tools (10s): Active-memory only needs 3 memory tools. Why load all 9 plugins' tools? Is there a tool-filtering mechanism for embedded runs?
System-prompt (8s): Building prompt for ~10KB of bootstrap files. Why 8 seconds? Is it doing file I/O, parsing, or context-building in a loop?
Stream-setup (8s): TLS handshake to Anthropic is 43ms. What accounts for the additional 7.9s?

Workaround

Disabled active-memory globally (config.enabled: false) while keeping plugin enabled for future re-activation.

Next Steps

Would appreciate guidance on:

Is this expected performance for embedded runs in 2026.4.29?
Are there optimizations planned for the startup/prep phases?
Is there a way to reduce tool loading scope for sub-agents that only need 3 specific tools?
Should we expect improvements in the next release, or is this architectural?

Gateway: ubuntu-s-2vcpu-2gb-90gb-intel-lon1-01 (8 CPU, 32GB RAM, 533GB disk)
Time: 2026-05-02 08:00-08:18 UTC
Timezone: Europe/London
OpenClaw version: 2026.4.29 (a448042)

extent analysis

TL;DR

The most likely fix for the Active-Memory plugin's 45-second timeout issue is to optimize the startup and prep phases of the embedded run framework, focusing on reducing the overhead of auth, model-resolution, core-plugin-tools, system-prompt, and stream-setup.

Guidance

Investigate auth stage: Verify if the 4.5s auth stage is due to an API call or loading a large auth context, and consider optimizing or caching credential resolution.
Optimize model-resolution: Check if model-resolution is querying provider endpoints or doing unnecessary initialization, and simplify the local config lookup.
Filter core-plugin-tools: Explore the possibility of loading only the required 3 memory tools for active-memory, instead of all 9 plugins' tools, to reduce the 10s overhead.
Improve system-prompt and stream-setup: Analyze the system-prompt and stream-setup phases to identify performance bottlenecks, such as file I/O, parsing, or context-building, and optimize these processes.
Consider circuit breaker implementation: Implement a circuit breaker to prevent cascading failures and reduce the impact of timeouts on the system.

Example

No code snippet is provided as the issue is more related to performance optimization and configuration rather than a specific code fix.

Notes

The provided information suggests that the issue is not specific to the Active-Memory plugin, but rather a general performance problem with the embedded run framework. Optimizing the startup and prep phases may require changes to the OpenClaw framework or configuration.

Recommendation

Apply a workaround by disabling the Active-Memory plugin globally until the performance issues are addressed, as done in the provided workaround (config.enabled: false). This will prevent the 45-second timeouts and allow for further investigation and optimization of the embedded run framework.

openclaw - 💡(How to fix) Fix Active-memory embedded run startup overhead: 44-51s before LLM call, 100% timeout rate [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

OpenClaw Active-Memory Bug Report: Embedded Run Startup Overhead

Summary

Impact

Root Cause

Startup Phase (12-16s)

Prep Phase (29-37s)

Evidence

Test Environment

Timings (journalctl PID 410217, May 2 08:00-08:18 UTC)

Comparison: Old Gateway (PID 397720)

Configuration Notes

What Config Changes CAN'T Fix

Hypotheses for Investigation

Workaround

Next Steps

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING