openclaw - ✅(Solved) Fix [Bug]: Active Memory timeoutMs clock starts at plugin level, not at LLM call — embedded run setup overhead causes 100% timeout [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72606Fetched 2026-04-28 06:34:01
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
cross-referenced ×2

Active Memory's timeoutMs timer starts when the plugin begins its pre-reply hook, but the embedded agent run initialization (context building, workspace setup, skill resolution) consistently takes 18–19 seconds before the LLM call even begins. With the default timeoutMs: 15000, the abort signal fires before the embedded run starts, causing 100% timeout rate even though the actual LLM responds in ~1.5 seconds.

Error Message

But the session JSONL shows a openclaw:prompt-error at 04:03:44.574Z with message "active-memory timeout after 15000ms" — the abort signal was already pending when the run started.

Root Cause

Root Cause Analysis

Fix Action

Workaround

Setting timeoutMs: 45000 (or higher) allows the embedded run to complete, but adds ~20 seconds of blocking latency to every reply, which defeats the purpose of active memory as a lightweight pre-reply enrichment.

PR fix notes

PR #72620: fix(active-memory): preserve setup time outside recall timeout

Description (problem / solution / changelog)

Summary

  • keep Active Memory's configured timeoutMs scoped to the embedded recall/model run
  • add a plugin-level setup grace window so embedded-run initialization does not consume the recall timeout budget before the LLM call starts
  • cover the regression where wrapper/setup time exceeds timeoutMs but the recall itself still succeeds

Fixes #72606

Testing

  • pnpm exec vitest run extensions/active-memory/index.test.ts
  • pnpm exec oxfmt --check extensions/active-memory/index.ts extensions/active-memory/index.test.ts
  • git diff --check

Changed files

  • extensions/active-memory/index.test.ts (modified, +36/-0)
  • extensions/active-memory/index.ts (modified, +9/-2)

Code Example

12:03:25.482  [plugins] active-memory: start timeoutMs=15000
19.0s — embedded run initialization
12:03:40.482  ⏰ 15s timeout expires, abort signal fires
                ↓ but embedded run is still initializing...
12:03:44.528  Embedded run session actually starts (4s after timeout!)
12:03:44.572  Prompt submitted to Nova Micro
12:03:46.351  Nova Micro responds in 1.8s, calls memory_search
12:03:46.352  memory_search returns "Aborted" (abort signal already fired)
12:03:47.768  Model returns "NONE" (can't search memory)
Plugin reports: status=timeout, elapsedMs=22328, summaryChars=0

---

session.started:  04:03:44.528Z
context.compiled: 04:03:44.572Z  (44ms)
prompt.submitted: 04:03:44.572Z  (0ms)
model.completed:  04:03:44.670Z  (98ms for first LLM turn)

---

Total runs:     24
Timeouts:       24
Successes:      0
Timeout rate:   100%
Config timeout: 15,000ms
Actual elapsed: 19,06231,207ms (avg ~21,227ms)
Setup overhead: 18,00019,000ms (consistent)
Actual LLM:     ~1,5003,200ms (when it gets to run)

---

[plugins] active-memory: done status=ok elapsedMs=22000 setupMs=19000 llmMs=3000 summaryChars=42

---

2026-04-27T12:03:25.482+08:00 [plugins] active-memory: agent=main session=agent:main:telegram:default:direct:REDACTED activeProvider=amazon-bedrock activeModel=amazon.nova-micro-v1:0 start timeoutMs=15000 queryChars=897
2026-04-27T12:03:47.808+08:00 [agent/embedded] embedded run failover decision: runId=active-memory-mogo9mzv-5b8ff721 stage=assistant decision=surface_error reason=timeout from=amazon-bedrock/amazon.nova-micro-v1:0 profile=-
2026-04-27T12:03:47.810+08:00 [plugins] active-memory: agent=main session=agent:main:telegram:default:direct:REDACTED activeProvider=amazon-bedrock activeModel=amazon.nova-micro-v1:0 done status=timeout elapsedMs=22328 summaryChars=0

---

2026-04-27T12:05:56.766+08:00 [skills] Skipping escaped skill path outside its configured root: source=openclaw-managed root=~/.openclaw/skills reason=symlink-escape requested=~/.openclaw/skills/lark-doc resolved=~/.agents/skills/lark-doc
(... 20+ similar lines for lark-attendance, lark-base, lark-calendar, etc.)
RAW_BUFFERClick to expand / collapse

[Bug]: Active Memory timeoutMs clock starts at plugin level, not at LLM call — embedded run setup overhead causes 100% timeout

Summary

Active Memory's timeoutMs timer starts when the plugin begins its pre-reply hook, but the embedded agent run initialization (context building, workspace setup, skill resolution) consistently takes 18–19 seconds before the LLM call even begins. With the default timeoutMs: 15000, the abort signal fires before the embedded run starts, causing 100% timeout rate even though the actual LLM responds in ~1.5 seconds.

Environment

  • OpenClaw: 2026.4.24
  • OS: Linux 5.15.0-173-generic (x64)
  • Runtime: Node v22.22.2, npm-global install
  • Provider: amazon-bedrock (Bedrock Converse Stream API, auth: aws-sdk)
  • Active Memory model: amazon-bedrock/amazon.nova-micro-v1:0
  • Active Memory config: timeoutMs: 15000, queryMode: recent, promptStyle: contextual, bootstrapContextMode: lightweight (hardcoded in plugin)
  • Skills: 50+ skills loaded (including 20+ lark-* symlink skills that trigger symlink-escape warnings)
  • Agents with active-memory: 3 (main, cbz001, clawdoctor)

Root Cause Analysis

The timing problem

12:03:25.482  [plugins] active-memory: start timeoutMs=15000
                ↓ 19.0s — embedded run initialization
12:03:40.482  ⏰ 15s timeout expires, abort signal fires
                ↓ but embedded run is still initializing...
12:03:44.528  Embedded run session actually starts (4s after timeout!)
12:03:44.572  Prompt submitted to Nova Micro
12:03:46.351  Nova Micro responds in 1.8s, calls memory_search
12:03:46.352  memory_search returns "Aborted" (abort signal already fired)
12:03:47.768  Model returns "NONE" (can't search memory)
              → Plugin reports: status=timeout, elapsedMs=22328, summaryChars=0

Direct API test proves model is fine

Tested Nova Micro directly via AWS SDK (@aws-sdk/client-bedrock-runtime Converse API):

TestLatencyStatus
Short prompt (149 chars)1,103ms
Medium prompt (3,402 chars)1,506ms
Long prompt (9,135 chars)1,322ms
3 concurrent medium1,366–1,519ms

The model responds in 1–1.5 seconds. The 19-second overhead is entirely in embedded run initialization.

Evidence from trajectory files

The persisted trajectory (active-memory-*.trajectory.jsonl) confirms the embedded run itself is fast once it starts:

session.started:  04:03:44.528Z
context.compiled: 04:03:44.572Z  (44ms)
prompt.submitted: 04:03:44.572Z  (0ms)
model.completed:  04:03:44.670Z  (98ms for first LLM turn)

But the session JSONL shows a openclaw:prompt-error at 04:03:44.574Z with message "active-memory timeout after 15000ms" — the abort signal was already pending when the run started.

Statistics

24/24 active-memory runs timed out today (100% failure rate):

Total runs:     24
Timeouts:       24
Successes:      0
Timeout rate:   100%
Config timeout: 15,000ms
Actual elapsed: 19,062–31,207ms (avg ~21,227ms)
Setup overhead: 18,000–19,000ms (consistent)
Actual LLM:     ~1,500–3,200ms (when it gets to run)

Expected Behavior

timeoutMs should govern the LLM call duration, not include the embedded run initialization overhead. If initialization takes 19 seconds, a timeoutMs: 15000 should mean 19s setup + 15s LLM budget = 34s total, not timeout before the LLM call starts.

Alternatively, the initialization overhead should be dramatically reduced so it doesn't dominate the timeout window.

Observed Behavior

  • timeoutMs timer starts at plugin hook entry
  • Embedded run initialization takes 18–19 seconds
  • By the time the LLM call begins, the abort signal has already fired
  • memory_search tool calls return "Aborted"
  • Model returns "NONE" (no useful memory recall)
  • Plugin reports status=timeout with summaryChars=0

Contributing Factors

  1. Skill resolution overhead: 20+ lark-* symlink skills trigger symlink-escape path checks during each embedded run, even though bootstrapContextMode: "lightweight" is set.

  2. Blocking embedded run: The runEmbeddedPiAgent call appears to have significant setup overhead (context compilation, workspace resolution, agent directory setup) that runs synchronously before the LLM call.

Suggested Fix

One or more of:

  1. Start timeoutMs clock at LLM call time, not plugin hook entry — the most impactful fix. The timeout should measure "how long we wait for the model," not "how long the entire pre-reply path takes."

  2. Exclude skill resolution from embedded run initialization — Active Memory only uses memory_search and memory_get (via toolsAllow), so full skill catalog resolution is unnecessary.

  3. Cache/reuse embedded run context across calls — if the same agent/session triggers active-memory repeatedly, the context compilation result could be cached.

  4. Report setup overhead separately in logs — add setupMs to the done log line so users can distinguish initialization time from LLM time:

    [plugins] active-memory: done status=ok elapsedMs=22000 setupMs=19000 llmMs=3000 summaryChars=42

Workaround

Setting timeoutMs: 45000 (or higher) allows the embedded run to complete, but adds ~20 seconds of blocking latency to every reply, which defeats the purpose of active memory as a lightweight pre-reply enrichment.

Related Issues

  • #66849 — 2026.4.14 upgrade causes active-memory timeouts (closed)
  • #68825 — Active Memory + qmd chain timeout on 4.15 (closed as duplicate of #66849)
  • #65517 — Active-memory embedded sub-agent blocks event loop

Log Excerpts

Plugin start → timeout pattern (repeats for all 24 runs)

2026-04-27T12:03:25.482+08:00 [plugins] active-memory: agent=main session=agent:main:telegram:default:direct:REDACTED activeProvider=amazon-bedrock activeModel=amazon.nova-micro-v1:0 start timeoutMs=15000 queryChars=897
2026-04-27T12:03:47.808+08:00 [agent/embedded] embedded run failover decision: runId=active-memory-mogo9mzv-5b8ff721 stage=assistant decision=surface_error reason=timeout from=amazon-bedrock/amazon.nova-micro-v1:0 profile=-
2026-04-27T12:03:47.810+08:00 [plugins] active-memory: agent=main session=agent:main:telegram:default:direct:REDACTED activeProvider=amazon-bedrock activeModel=amazon.nova-micro-v1:0 done status=timeout elapsedMs=22328 summaryChars=0

Skill symlink-escape warnings (20+ per run, contributes to setup overhead)

2026-04-27T12:05:56.766+08:00 [skills] Skipping escaped skill path outside its configured root: source=openclaw-managed root=~/.openclaw/skills reason=symlink-escape requested=~/.openclaw/skills/lark-doc resolved=~/.agents/skills/lark-doc
(... 20+ similar lines for lark-attendance, lark-base, lark-calendar, etc.)

extent analysis

TL;DR

Increase the timeoutMs value or implement a fix to start the timeoutMs clock at LLM call time to account for the 18-19 second embedded run initialization overhead.

Guidance

  • Review the timeoutMs configuration and consider increasing its value to accommodate the embedded run initialization overhead.
  • Investigate starting the timeoutMs clock at LLM call time instead of plugin hook entry to ensure accurate timeout measurement.
  • Analyze the skill resolution overhead and consider excluding it from embedded run initialization or caching/reusing embedded run context across calls.
  • Add setupMs to the done log line to distinguish initialization time from LLM time and better understand the performance bottlenecks.

Example

No code snippet is provided as the issue is more related to configuration and performance optimization.

Notes

The provided information suggests that the issue is related to the timing of the timeoutMs clock and the overhead of embedded run initialization. Increasing the timeoutMs value or implementing a fix to start the clock at LLM call time may resolve the issue. However, further investigation is needed to determine the root cause and the most effective solution.

Recommendation

Apply a workaround by increasing the timeoutMs value to a higher value (e.g., 45000) to allow the embedded run to complete, but note that this may add blocking latency to every reply. A more permanent fix would involve starting the timeoutMs clock at LLM call time or optimizing the embedded run initialization process.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Active Memory timeoutMs clock starts at plugin level, not at LLM call — embedded run setup overhead causes 100% timeout [1 pull requests, 1 participants]