openclaw - 💡(How to fix) Fix Active Memory: timeoutMs unenforceable due to multi-turn agent loop; need single-shot mode [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72347Fetched 2026-04-27 05:31:12
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Timeline (top)
commented ×2closed ×1cross-referenced ×1

Active Memory is documented and configured as a "fast preflight" with a timeoutMs budget (defaulting to 3000ms, our config 5000ms), but in practice it consistently runs 12–22s regardless of which model is used. This is because AM invokes runEmbeddedPiAgent — the same multi-turn pi-coding-agent runtime used for full conversational replies — which performs N model calls + tool round-trips before producing a summary. The timeoutMs is a soft abort that only takes effect at turn boundaries, so once the first tool call is in flight, elapsed time always exceeds budget.

Root Cause

  • The plugin documentation and uiHints describe timeoutMs as a hard budget ("Timeout (ms)"). Operators tune it down expecting AM to bound reply-path latency. It cannot.
  • Operators cannot recover by switching to a faster model. We tried four (cloud OAuth, cloud API, local). All hit the same wall — the bottleneck is the agent loop, not the model.
  • AM's status=timeout summaryChars=0 results 100% of the time means AM contributes only latency, never injection. Fail-open keeps replies working but the feature is effectively dead.

Fix Action

Fix / Workaround

Workaround we're shipping

Code Example

"plugins.entries.active-memory.config": {
  "agents": ["main"],
  "allowedChatTypes": ["direct"],
  "queryMode": "message",
  "promptStyle": "preference-only",
  "timeoutMs": 5000,
  "maxSummaryChars": 120,
  "logging": true
}

---

[plugins] active-memory: ... activeProvider=openai activeModel=gpt-5.4-mini start timeoutMs=5000 queryChars=373
[plugins] active-memory: ... done status=timeout elapsedMs=12518 summaryChars=0

---

toolsAllow: ["memory_search", "memory_get"],
disableMessageTool: true,
bootstrapContextMode: "lightweight",
timeoutMs: params.config.timeoutMs,
RAW_BUFFERClick to expand / collapse

Active Memory: structural latency from multi-turn agent loop makes timeoutMs effectively unenforceable

Summary

Active Memory is documented and configured as a "fast preflight" with a timeoutMs budget (defaulting to 3000ms, our config 5000ms), but in practice it consistently runs 12–22s regardless of which model is used. This is because AM invokes runEmbeddedPiAgent — the same multi-turn pi-coding-agent runtime used for full conversational replies — which performs N model calls + tool round-trips before producing a summary. The timeoutMs is a soft abort that only takes effect at turn boundaries, so once the first tool call is in flight, elapsed time always exceeds budget.

Environment

  • OpenClaw 2026.4.24
  • macOS 26.4.1 arm64, Apple M1 Max, Node 25.9.0
  • Agent: main (single-user setup)
  • Channel: telegram direct
  • Memory backend: qmd (BM25 lexical, ~22k docs across 6 collections, no embeddings)

Reproduction (a single Telegram DM to the agent)

~/.openclaw/openclaw.json:

"plugins.entries.active-memory.config": {
  "agents": ["main"],
  "allowedChatTypes": ["direct"],
  "queryMode": "message",
  "promptStyle": "preference-only",
  "timeoutMs": 5000,
  "maxSummaryChars": 120,
  "logging": true
}

Send any DM. Observe in ~/.openclaw/logs/gateway.log:

[plugins] active-memory: ... activeProvider=openai activeModel=gpt-5.4-mini start timeoutMs=5000 queryChars=373
[plugins] active-memory: ... done status=timeout elapsedMs=12518 summaryChars=0

Repeated across 4 different models — same outcome:

ModelProvider pathTypical elapsedsummaryChars
openai-codex/gpt-5.5OAuth12–22s0
openai-codex/gpt-5.4OAuth12–18s0
openai/gpt-5.4-miniAPI12–22s0
ollama/qwen2.5:1.5bLocal13–27s0

Direct API benchmark of the fastest model (gpt-5.4-mini) outside OpenClaw: 1.82s for a 5-token completion. So the model itself is fast — the AM wrapper isn't.

Root cause (read from source)

extensions/active-memory/index.js:910–937 — AM calls runEmbeddedPiAgent with:

toolsAllow: ["memory_search", "memory_get"],
disableMessageTool: true,
bootstrapContextMode: "lightweight",
timeoutMs: params.config.timeoutMs,

This is the full pi-coding-agent runtime, which:

  1. Sends prompt + tool schemas to the model (~1.8s API round-trip on gpt-5.4-mini).
  2. Receives a tool_use for memory_search.
  3. Executes qmd query (up to memory.qmd.limits.timeoutMs = 5000ms in our config).
  4. Sends tool result back to the model (another ~1.8s).
  5. Optionally repeats with memory_get.
  6. Eventually emits a final text response.

Best case: 2 model calls + 1 qmd query = ~5–7s. Common case: 3 model calls + slow qmd = ~12s. Worst case with retries / malformed tool calls: 22s+.

abortSignal is honored at the next turn boundary (see runEmbeddedPiAgent and agent-session.js), so the in-flight HTTP call to the LLM provider always finishes — elapsedMs therefore always > timeoutMs once any tool call has fired.

Why this matters

  • The plugin documentation and uiHints describe timeoutMs as a hard budget ("Timeout (ms)"). Operators tune it down expecting AM to bound reply-path latency. It cannot.
  • Operators cannot recover by switching to a faster model. We tried four (cloud OAuth, cloud API, local). All hit the same wall — the bottleneck is the agent loop, not the model.
  • AM's status=timeout summaryChars=0 results 100% of the time means AM contributes only latency, never injection. Fail-open keeps replies working but the feature is effectively dead.

Requested change

Either of the following would unblock operators in single-user / latency-sensitive deployments:

  1. Single-shot mode. A new config option, e.g. plugins.entries.active-memory.config.mode: "single-shot" that:

    • Runs the qmd search outside the model call (using queryMode to derive the query).
    • Sends prompt + qmd results to the model in one API call.
    • Asks for: "either NONE, or a ≤maxSummaryChars-char summary."
    • No tools, no agent loop. Wall-clock = 1 API round-trip + 1 qmd search ≈ 2–3s.
  2. Hard timeout. Make timeoutMs enforce a real wall-clock budget by aborting the in-flight HTTP request at the budget, not at the next turn boundary. Less ideal because it just turns long calls into hard failures, but at least the budget would be honest.

Option 1 is what promptStyle: "preference-only" already implies the design intent of — a cheap classifier+extractor, not an autonomous agent.

Workaround we're shipping

Disabling Active Memory entirely (enabled: false, hot-reloadable). Memory remains accessible to the main agent via the existing memory_search tool when the model decides to call it directly. Lose: automatic preflight injection. Keep: consistent fast replies.

Notes

  • We verified timeoutMs IS hot-reloadable, but model, modelFallback, and most other AM keys force a SIGTERM full restart.
  • AM does fail-open correctly — main reply is unaffected by AM timeout. That part of the design is solid.
  • Per-agent models.json placeholder regeneration is a separate bug we hit while debugging this; happy to file separately if useful.

Logs available

Full gateway and structured JSON logs from 2026-04-25/26 available on request — left out for brevity.

extent analysis

TL;DR

Implement a single-shot mode for Active Memory to enforce the timeoutMs budget and reduce latency.

Guidance

  • Review the extensions/active-memory/index.js file to understand how the runEmbeddedPiAgent function is called and how it affects the timeoutMs budget.
  • Consider implementing a new config option, e.g., plugins.entries.active-memory.config.mode: "single-shot", to run the qmd search outside the model call and send the results to the model in one API call.
  • Evaluate the feasibility of aborting the in-flight HTTP request at the timeoutMs budget, rather than at the next turn boundary, to enforce a hard timeout.
  • Test the single-shot mode with different models and configurations to ensure it meets the performance requirements.

Example

// Example of single-shot mode implementation
const singleShotMode = params.config.mode === 'single-shot';
if (singleShotMode) {
  // Run qmd search outside the model call
  const qmdResults = runQmdSearch(query);
  // Send prompt + qmd results to the model in one API call
  const modelResponse = runModelCall(prompt, qmdResults);
  // Process model response and return summary
  return processModelResponse(modelResponse);
} else {
  // Existing implementation using runEmbeddedPiAgent
  return runEmbeddedPiAgent/toolsAllow/toolsDisableMessageTool/...
}

Notes

  • The single-shot mode implementation may require significant changes to the extensions/active-memory/index.js file and may have implications for the overall system performance.
  • The hard timeout approach may lead to increased error rates and should be carefully evaluated before implementation.

Recommendation

Apply the single-shot mode workaround to reduce latency and enforce the timeoutMs budget, as it aligns with the design intent of Active Memory as a cheap classifier+extractor.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING