openclaw - ✅(Solved) Fix [Bug]: Active Memory timeoutMs clock starts at plugin level, not at LLM call — embedded run setup overhead causes 100% timeout [1 pull requests, 1 participants]

openclaw2026-04-27 05:20:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72606•Fetched 2026-04-28 06:34:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

wujiaming88

Participants

wujiaming88

Timeline (top)

cross-referenced ×2

Active Memory's timeoutMs timer starts when the plugin begins its pre-reply hook, but the embedded agent run initialization (context building, workspace setup, skill resolution) consistently takes 18–19 seconds before the LLM call even begins. With the default timeoutMs: 15000, the abort signal fires before the embedded run starts, causing 100% timeout rate even though the actual LLM responds in ~1.5 seconds.

Error Message

But the session JSONL shows a openclaw:prompt-error at 04:03:44.574Z with message "active-memory timeout after 15000ms" — the abort signal was already pending when the run started.

Root Cause

Root Cause Analysis

Fix Action

Workaround

Setting timeoutMs: 45000 (or higher) allows the embedded run to complete, but adds ~20 seconds of blocking latency to every reply, which defeats the purpose of active memory as a lightweight pre-reply enrichment.

PR fix notes

PR #72620: fix(active-memory): preserve setup time outside recall timeout

Repository: openclaw/openclaw
Author: hyspacex
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/72620

Description (problem / solution / changelog)

Summary

keep Active Memory's configured timeoutMs scoped to the embedded recall/model run
add a plugin-level setup grace window so embedded-run initialization does not consume the recall timeout budget before the LLM call starts
cover the regression where wrapper/setup time exceeds timeoutMs but the recall itself still succeeds

Fixes #72606

Testing

pnpm exec vitest run extensions/active-memory/index.test.ts
pnpm exec oxfmt --check extensions/active-memory/index.ts extensions/active-memory/index.test.ts
git diff --check

Changed files

extensions/active-memory/index.test.ts (modified, +36/-0)
extensions/active-memory/index.ts (modified, +9/-2)

Code Example

12:03:25.482  [plugins] active-memory: start timeoutMs=15000
                ↓ 19.0s — embedded run initialization
12:03:40.482  ⏰ 15s timeout expires, abort signal fires
                ↓ but embedded run is still initializing...
12:03:44.528  Embedded run session actually starts (4s after timeout!)
12:03:44.572  Prompt submitted to Nova Micro
12:03:46.351  Nova Micro responds in 1.8s, calls memory_search
12:03:46.352  memory_search returns "Aborted" (abort signal already fired)
12:03:47.768  Model returns "NONE" (can't search memory)
              → Plugin reports: status=timeout, elapsedMs=22328, summaryChars=0

---

session.started:  04:03:44.528Z
context.compiled: 04:03:44.572Z  (44ms)
prompt.submitted: 04:03:44.572Z  (0ms)
model.completed:  04:03:44.670Z  (98ms for first LLM turn)

---

Total runs:     24
Timeouts:       24
Successes:      0
Timeout rate:   100%
Config timeout: 15,000ms
Actual elapsed: 19,062–31,207ms (avg ~21,227ms)
Setup overhead: 18,000–19,000ms (consistent)
Actual LLM:     ~1,500–3,200ms (when it gets to run)

---

[plugins] active-memory: done status=ok elapsedMs=22000 setupMs=19000 llmMs=3000 summaryChars=42

---

2026-04-27T12:03:25.482+08:00 [plugins] active-memory: agent=main session=agent:main:telegram:default:direct:REDACTED activeProvider=amazon-bedrock activeModel=amazon.nova-micro-v1:0 start timeoutMs=15000 queryChars=897
2026-04-27T12:03:47.808+08:00 [agent/embedded] embedded run failover decision: runId=active-memory-mogo9mzv-5b8ff721 stage=assistant decision=surface_error reason=timeout from=amazon-bedrock/amazon.nova-micro-v1:0 profile=-
2026-04-27T12:03:47.810+08:00 [plugins] active-memory: agent=main session=agent:main:telegram:default:direct:REDACTED activeProvider=amazon-bedrock activeModel=amazon.nova-micro-v1:0 done status=timeout elapsedMs=22328 summaryChars=0

---

2026-04-27T12:05:56.766+08:00 [skills] Skipping escaped skill path outside its configured root: source=openclaw-managed root=~/.openclaw/skills reason=symlink-escape requested=~/.openclaw/skills/lark-doc resolved=~/.agents/skills/lark-doc
(... 20+ similar lines for lark-attendance, lark-base, lark-calendar, etc.)

RAW_BUFFERClick to expand / collapse

[Bug]: Active Memory `timeoutMs` clock starts at plugin level, not at LLM call — embedded run setup overhead causes 100% timeout

Summary

Environment

OpenClaw: 2026.4.24
OS: Linux 5.15.0-173-generic (x64)
Runtime: Node v22.22.2, npm-global install
Provider: amazon-bedrock (Bedrock Converse Stream API, auth: aws-sdk)
Active Memory model: amazon-bedrock/amazon.nova-micro-v1:0
Active Memory config: timeoutMs: 15000, queryMode: recent, promptStyle: contextual, bootstrapContextMode: lightweight (hardcoded in plugin)
Skills: 50+ skills loaded (including 20+ lark-* symlink skills that trigger symlink-escape warnings)
Agents with active-memory: 3 (main, cbz001, clawdoctor)

Root Cause Analysis

The timing problem

12:03:25.482  [plugins] active-memory: start timeoutMs=15000
                ↓ 19.0s — embedded run initialization
12:03:40.482  ⏰ 15s timeout expires, abort signal fires
                ↓ but embedded run is still initializing...
12:03:44.528  Embedded run session actually starts (4s after timeout!)
12:03:44.572  Prompt submitted to Nova Micro
12:03:46.351  Nova Micro responds in 1.8s, calls memory_search
12:03:46.352  memory_search returns "Aborted" (abort signal already fired)
12:03:47.768  Model returns "NONE" (can't search memory)
              → Plugin reports: status=timeout, elapsedMs=22328, summaryChars=0

Direct API test proves model is fine

Tested Nova Micro directly via AWS SDK (@aws-sdk/client-bedrock-runtime Converse API):

Test	Latency	Status
Short prompt (149 chars)	1,103ms	✅
Medium prompt (3,402 chars)	1,506ms	✅
Long prompt (9,135 chars)	1,322ms	✅
3 concurrent medium	1,366–1,519ms	✅

The model responds in 1–1.5 seconds. The 19-second overhead is entirely in embedded run initialization.

Evidence from trajectory files

The persisted trajectory (active-memory-*.trajectory.jsonl) confirms the embedded run itself is fast once it starts:

session.started:  04:03:44.528Z
context.compiled: 04:03:44.572Z  (44ms)
prompt.submitted: 04:03:44.572Z  (0ms)
model.completed:  04:03:44.670Z  (98ms for first LLM turn)

But the session JSONL shows a openclaw:prompt-error at 04:03:44.574Z with message "active-memory timeout after 15000ms" — the abort signal was already pending when the run started.

Statistics

24/24 active-memory runs timed out today (100% failure rate):

Total runs:     24
Timeouts:       24
Successes:      0
Timeout rate:   100%
Config timeout: 15,000ms
Actual elapsed: 19,062–31,207ms (avg ~21,227ms)
Setup overhead: 18,000–19,000ms (consistent)
Actual LLM:     ~1,500–3,200ms (when it gets to run)

Expected Behavior

timeoutMs should govern the LLM call duration, not include the embedded run initialization overhead. If initialization takes 19 seconds, a timeoutMs: 15000 should mean 19s setup + 15s LLM budget = 34s total, not timeout before the LLM call starts.

Alternatively, the initialization overhead should be dramatically reduced so it doesn't dominate the timeout window.

Observed Behavior

timeoutMs timer starts at plugin hook entry
Embedded run initialization takes 18–19 seconds
By the time the LLM call begins, the abort signal has already fired
memory_search tool calls return "Aborted"
Model returns "NONE" (no useful memory recall)
Plugin reports status=timeout with summaryChars=0

Contributing Factors

Skill resolution overhead: 20+ lark-* symlink skills trigger symlink-escape path checks during each embedded run, even though bootstrapContextMode: "lightweight" is set.
Blocking embedded run: The runEmbeddedPiAgent call appears to have significant setup overhead (context compilation, workspace resolution, agent directory setup) that runs synchronously before the LLM call.

Suggested Fix

One or more of:

Start timeoutMs clock at LLM call time, not plugin hook entry — the most impactful fix. The timeout should measure "how long we wait for the model," not "how long the entire pre-reply path takes."
Exclude skill resolution from embedded run initialization — Active Memory only uses memory_search and memory_get (via toolsAllow), so full skill catalog resolution is unnecessary.
Cache/reuse embedded run context across calls — if the same agent/session triggers active-memory repeatedly, the context compilation result could be cached.
Report setup overhead separately in logs — add setupMs to the done log line so users can distinguish initialization time from LLM time:
```
[plugins] active-memory: done status=ok elapsedMs=22000 setupMs=19000 llmMs=3000 summaryChars=42
```

Workaround

Related Issues

#66849 — 2026.4.14 upgrade causes active-memory timeouts (closed)
#68825 — Active Memory + qmd chain timeout on 4.15 (closed as duplicate of #66849)
#65517 — Active-memory embedded sub-agent blocks event loop

Log Excerpts

Plugin start → timeout pattern (repeats for all 24 runs)

2026-04-27T12:03:25.482+08:00 [plugins] active-memory: agent=main session=agent:main:telegram:default:direct:REDACTED activeProvider=amazon-bedrock activeModel=amazon.nova-micro-v1:0 start timeoutMs=15000 queryChars=897
2026-04-27T12:03:47.808+08:00 [agent/embedded] embedded run failover decision: runId=active-memory-mogo9mzv-5b8ff721 stage=assistant decision=surface_error reason=timeout from=amazon-bedrock/amazon.nova-micro-v1:0 profile=-
2026-04-27T12:03:47.810+08:00 [plugins] active-memory: agent=main session=agent:main:telegram:default:direct:REDACTED activeProvider=amazon-bedrock activeModel=amazon.nova-micro-v1:0 done status=timeout elapsedMs=22328 summaryChars=0

Skill symlink-escape warnings (20+ per run, contributes to setup overhead)

2026-04-27T12:05:56.766+08:00 [skills] Skipping escaped skill path outside its configured root: source=openclaw-managed root=~/.openclaw/skills reason=symlink-escape requested=~/.openclaw/skills/lark-doc resolved=~/.agents/skills/lark-doc
(... 20+ similar lines for lark-attendance, lark-base, lark-calendar, etc.)

extent analysis

TL;DR

Increase the timeoutMs value or implement a fix to start the timeoutMs clock at LLM call time to account for the 18-19 second embedded run initialization overhead.

Guidance

Review the timeoutMs configuration and consider increasing its value to accommodate the embedded run initialization overhead.
Investigate starting the timeoutMs clock at LLM call time instead of plugin hook entry to ensure accurate timeout measurement.
Analyze the skill resolution overhead and consider excluding it from embedded run initialization or caching/reusing embedded run context across calls.
Add setupMs to the done log line to distinguish initialization time from LLM time and better understand the performance bottlenecks.

Example

No code snippet is provided as the issue is more related to configuration and performance optimization.

Notes

The provided information suggests that the issue is related to the timing of the timeoutMs clock and the overhead of embedded run initialization. Increasing the timeoutMs value or implementing a fix to start the clock at LLM call time may resolve the issue. However, further investigation is needed to determine the root cause and the most effective solution.

Recommendation

Apply a workaround by increasing the timeoutMs value to a higher value (e.g., 45000) to allow the embedded run to complete, but note that this may add blocking latency to every reply. A more permanent fix would involve starting the timeoutMs clock at LLM call time or optimizing the embedded run initialization process.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #GPU setup #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Active Memory timeoutMs clock starts at plugin level, not at LLM call — embedded run setup overhead causes 100% timeout [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root Cause Analysis

Fix Action

Workaround

PR fix notes

PR #72620: fix(active-memory): preserve setup time outside recall timeout

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

[Bug]: Active Memory timeoutMs clock starts at plugin level, not at LLM call — embedded run setup overhead causes 100% timeout

Summary

Environment

Root Cause Analysis

The timing problem

Direct API test proves model is fine

Evidence from trajectory files

Statistics

Expected Behavior

Observed Behavior

Contributing Factors

Suggested Fix

Workaround

Related Issues

Log Excerpts

Plugin start → timeout pattern (repeats for all 24 runs)

Skill symlink-escape warnings (20+ per run, contributes to setup overhead)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

[Bug]: Active Memory `timeoutMs` clock starts at plugin level, not at LLM call — embedded run setup overhead causes 100% timeout