openclaw - 💡(How to fix) Fix Active Memory: timeoutMs unenforceable due to multi-turn agent loop; need single-shot mode [2 comments, 2 participants]

thecolormaroun · 2026-04-26T20:04:50Z

[openclaw] Active Memory is documented and configured as a "fast preflight" with a timeoutMs budget defaulting to 3000ms, our config 5000ms , but in practice i… Active Memory is documented and configured as a "fast preflight" with a `timeoutMs` budget (defaulting to 3000ms, our config 5000ms), but in practice it consistently runs 12–22s regardless of which model is used. This is because AM invokes `runEmbeddedPiAgent` — the same multi-turn pi-coding-agent runtime used for full conversational replies — which performs N model calls + tool round-trips before producing a summary. The `timeoutMs` is a soft abort that only takes effect at turn boundaries, so once the first tool call is in flight, elapsed time always exceeds budget. ## Fix / Workaround ### Workaround we're shipping ## Active Memory: structural latency from multi-turn agent loop makes timeoutMs effectively unenforceable ### Summary Active Memory is documented and configured as a "fast preflight" with a `timeoutMs` budget (defaulting to 3000ms, our config 5000ms), but in practice it consistently runs 12–22s regardless of which model is used. This is because AM invokes `runEmbeddedPiAgent` — the same multi-turn pi-coding-agent runtime used for full conversational replies — which performs N model calls + tool round-trips before producing a summary. The `timeoutMs` is a soft abort that only takes effect at turn boundaries, so once the first tool call is in flight, elapsed time always exceeds budget. ### Environment - OpenClaw `2026.4.24` - macOS `26.4.1` arm64, Apple M1 Max, Node `25.9.0` - Agent: `main` (single-user setup) - Channel: `telegram` direct - Memory backend: qmd (BM25 lexical, ~22k docs across 6 collections, no embeddings) ### Reproduction (a single Telegram DM to the agent) `~/.openclaw/openclaw.json`: ```json "plugins.entries.active-memory.config": { "agents": ["main"], "allowedChatTypes": ["direct"], "queryMode": "message", "promptStyle": "preference-only", "timeoutMs": 5000, "maxSummaryChars": 120, "logging": true } ``` Send any DM. Observe in `~/.openclaw/logs/gateway.log`: ``` [plugins] active-memory: ... activeProvider=openai activeModel=gpt-5.4-mini start timeoutMs=5000 queryChars=373 [plugins] active-memory: ... done status=timeout elapsedMs=12518 summaryChars=0 ``` Repeated across 4 different models — same outcome: | Model | Provider path | Typical elapsed | summaryChars | |---|---|---|---| | `openai-codex/gpt-5.5` | OAuth | 12–22s | 0 | | `openai-codex/gpt-5.4` | OAuth | 12–18s | 0 | | `openai/gpt-5.4-mini` | API | 12–22s | 0 | | `ollama/qwen2.5:1.5b` | Local | 13–27s | 0 | Direct API benchmark of the fastest model (`gpt-5.4-mini`) outside OpenClaw: **1.82s** for a 5-token completion. So the model itself is fast — the AM wrapper isn't. ### Root cause (read from source) `extensions/active-memory/index.js:910–937` — AM calls `runEmbeddedPiAgent` with: ```js toolsAllow: ["memory_search", "memory_get"], disableMessageTool: true, bootstrapContextMode: "lightweight", timeoutMs: params.config.timeoutMs, ``` This is the full pi-coding-agent runtime, which: 1. Sends prompt + tool schemas to the model (~1.8s API round-trip on gpt-5.4-mini). 2. Receives a `tool_use` for `memory_search`. 3. Executes qmd query (up to `memory.qmd.limits.timeoutMs` = 5000ms in our config). 4. Sends tool result back to the model (another ~1.8s). 5. Optionally repeats with `memory_get`. 6. Eventually emits a final `text` response. Best case: 2 model calls + 1 qmd query = ~5–7s. Common case: 3 model calls + slow qmd = ~12s. Worst case with retries / malformed tool calls: 22s+. `abortSignal` is honored at the **next turn boundary** (see `runEmbeddedPiAgent` and `agent-session.js`), so the in-flight HTTP call to the LLM provider always finishes — `elapsedMs` therefore always > `timeoutMs` once any tool call has fired. ### Why this matters - The plugin documentation and `uiHints` describe `timeoutMs` as a hard budget ("Timeout (ms)"). Operators tune it down expecting AM to bound reply-path latency. It cannot. - Operators cannot recover by switching to a faster model. We tried four (cloud OAuth, cloud API, local). All hit the same wall — the bottleneck is the agent loop, not the model. - AM's `status=timeout summaryChars=0` results 100% of the time means **AM contributes only latency, never injection**. Fail-open keeps replies working but the feature is effectively dead. ### Requested change Either of the following would unblock operators in single-user / latency-sensitive deployments: 1. **Single-shot mode**. A new config option, e.g. `plugins.entries.active-memory.config.mode: "single-shot"` that: - Runs the qmd search **outside** the model call (using `queryMode` to derive the query). - Sends `prompt + qmd results` to the model in one API call. - Asks for: "either NONE, or a ≤`maxSummaryChars`-char summary." - No tools, no agent loop. Wall-clock = 1 API round-trip + 1 qmd search ≈ 2–3s. 2. **Hard timeout**. Make `timeoutMs` enforce a real wall-clock budget by abort

Root Cause

The plugin documentation and uiHints describe timeoutMs as a hard budget ("Timeout (ms)"). Operators tune it down expecting AM to bound reply-path latency. It cannot.
Operators cannot recover by switching to a faster model. We tried four (cloud OAuth, cloud API, local). All hit the same wall — the bottleneck is the agent loop, not the model.
AM's status=timeout summaryChars=0 results 100% of the time means AM contributes only latency, never injection. Fail-open keeps replies working but the feature is effectively dead.

Code Example

"plugins.entries.active-memory.config": {
  "agents": ["main"],
  "allowedChatTypes": ["direct"],
  "queryMode": "message",
  "promptStyle": "preference-only",
  "timeoutMs": 5000,
  "maxSummaryChars": 120,
  "logging": true
}

---

[plugins] active-memory: ... activeProvider=openai activeModel=gpt-5.4-mini start timeoutMs=5000 queryChars=373
[plugins] active-memory: ... done status=timeout elapsedMs=12518 summaryChars=0

---

toolsAllow: ["memory_search", "memory_get"],
disableMessageTool: true,
bootstrapContextMode: "lightweight",
timeoutMs: params.config.timeoutMs,

Active Memory: structural latency from multi-turn agent loop makes timeoutMs effectively unenforceable

Summary

Active Memory is documented and configured as a "fast preflight" with a timeoutMs budget (defaulting to 3000ms, our config 5000ms), but in practice it consistently runs 12–22s regardless of which model is used. This is because AM invokes runEmbeddedPiAgent — the same multi-turn pi-coding-agent runtime used for full conversational replies — which performs N model calls + tool round-trips before producing a summary. The timeoutMs is a soft abort that only takes effect at turn boundaries, so once the first tool call is in flight, elapsed time always exceeds budget.

Environment

OpenClaw 2026.4.24
macOS 26.4.1 arm64, Apple M1 Max, Node 25.9.0
Agent: main (single-user setup)
Channel: telegram direct
Memory backend: qmd (BM25 lexical, ~22k docs across 6 collections, no embeddings)

Reproduction (a single Telegram DM to the agent)

~/.openclaw/openclaw.json:

"plugins.entries.active-memory.config": {
  "agents": ["main"],
  "allowedChatTypes": ["direct"],
  "queryMode": "message",
  "promptStyle": "preference-only",
  "timeoutMs": 5000,
  "maxSummaryChars": 120,
  "logging": true
}

Send any DM. Observe in ~/.openclaw/logs/gateway.log:

[plugins] active-memory: ... activeProvider=openai activeModel=gpt-5.4-mini start timeoutMs=5000 queryChars=373
[plugins] active-memory: ... done status=timeout elapsedMs=12518 summaryChars=0

Repeated across 4 different models — same outcome:

Model	Provider path	Typical elapsed
`openai-codex/gpt-5.5`	OAuth	12–22s
`openai-codex/gpt-5.4`	OAuth	12–18s
`openai/gpt-5.4-mini`	API	12–22s
`ollama/qwen2.5:1.5b`	Local	13–27s

Direct API benchmark of the fastest model (gpt-5.4-mini) outside OpenClaw: 1.82s for a 5-token completion. So the model itself is fast — the AM wrapper isn't.

Root cause (read from source)

extensions/active-memory/index.js:910–937 — AM calls runEmbeddedPiAgent with:

toolsAllow: ["memory_search", "memory_get"],
disableMessageTool: true,
bootstrapContextMode: "lightweight",
timeoutMs: params.config.timeoutMs,

This is the full pi-coding-agent runtime, which:

Sends prompt + tool schemas to the model (~1.8s API round-trip on gpt-5.4-mini).
Receives a tool_use for memory_search.
Executes qmd query (up to memory.qmd.limits.timeoutMs = 5000ms in our config).
Sends tool result back to the model (another ~1.8s).
Optionally repeats with memory_get.
Eventually emits a final text response.

Best case: 2 model calls + 1 qmd query = ~5–7s. Common case: 3 model calls + slow qmd = ~12s. Worst case with retries / malformed tool calls: 22s+.

abortSignal is honored at the next turn boundary (see runEmbeddedPiAgent and agent-session.js), so the in-flight HTTP call to the LLM provider always finishes — elapsedMs therefore always > timeoutMs once any tool call has fired.

Why this matters

The plugin documentation and uiHints describe timeoutMs as a hard budget ("Timeout (ms)"). Operators tune it down expecting AM to bound reply-path latency. It cannot.
Operators cannot recover by switching to a faster model. We tried four (cloud OAuth, cloud API, local). All hit the same wall — the bottleneck is the agent loop, not the model.
AM's status=timeout summaryChars=0 results 100% of the time means AM contributes only latency, never injection. Fail-open keeps replies working but the feature is effectively dead.

Requested change

Either of the following would unblock operators in single-user / latency-sensitive deployments:

Single-shot mode. A new config option, e.g. plugins.entries.active-memory.config.mode: "single-shot" that:
- Runs the qmd search outside the model call (using queryMode to derive the query).
- Sends prompt + qmd results to the model in one API call.
- Asks for: "either NONE, or a ≤maxSummaryChars-char summary."
- No tools, no agent loop. Wall-clock = 1 API round-trip + 1 qmd search ≈ 2–3s.
Hard timeout. Make timeoutMs enforce a real wall-clock budget by aborting the in-flight HTTP request at the budget, not at the next turn boundary. Less ideal because it just turns long calls into hard failures, but at least the budget would be honest.

Option 1 is what promptStyle: "preference-only" already implies the design intent of — a cheap classifier+extractor, not an autonomous agent.

Workaround we're shipping

Disabling Active Memory entirely (enabled: false, hot-reloadable). Memory remains accessible to the main agent via the existing memory_search tool when the model decides to call it directly. Lose: automatic preflight injection. Keep: consistent fast replies.

Notes

We verified timeoutMs IS hot-reloadable, but model, modelFallback, and most other AM keys force a SIGTERM full restart.
AM does fail-open correctly — main reply is unaffected by AM timeout. That part of the design is solid.
Per-agent models.json placeholder regeneration is a separate bug we hit while debugging this; happy to file separately if useful.

Logs available

Full gateway and structured JSON logs from 2026-04-25/26 available on request — left out for brevity.

extent analysis

TL;DR

Implement a single-shot mode for Active Memory to enforce the timeoutMs budget and reduce latency.

Guidance

Review the extensions/active-memory/index.js file to understand how the runEmbeddedPiAgent function is called and how it affects the timeoutMs budget.
Consider implementing a new config option, e.g., plugins.entries.active-memory.config.mode: "single-shot", to run the qmd search outside the model call and send the results to the model in one API call.
Evaluate the feasibility of aborting the in-flight HTTP request at the timeoutMs budget, rather than at the next turn boundary, to enforce a hard timeout.
Test the single-shot mode with different models and configurations to ensure it meets the performance requirements.

Example

// Example of single-shot mode implementation
const singleShotMode = params.config.mode === 'single-shot';
if (singleShotMode) {
  // Run qmd search outside the model call
  const qmdResults = runQmdSearch(query);
  // Send prompt + qmd results to the model in one API call
  const modelResponse = runModelCall(prompt, qmdResults);
  // Process model response and return summary
  return processModelResponse(modelResponse);
} else {
  // Existing implementation using runEmbeddedPiAgent
  return runEmbeddedPiAgent/toolsAllow/toolsDisableMessageTool/...
}

Notes

The single-shot mode implementation may require significant changes to the extensions/active-memory/index.js file and may have implications for the overall system performance.
The hard timeout approach may lead to increased error rates and should be carefully evaluated before implementation.

Recommendation

Apply the single-shot mode workaround to reduce latency and enforce the timeoutMs budget, as it aligns with the design intent of Active Memory as a cheap classifier+extractor.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Active Memory: timeoutMs unenforceable due to multi-turn agent loop; need single-shot mode [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround we're shipping

Code Example

Active Memory: structural latency from multi-turn agent loop makes timeoutMs effectively unenforceable

Summary

Environment

Reproduction (a single Telegram DM to the agent)

Root cause (read from source)

Why this matters

Requested change

Workaround we're shipping

Notes

Logs available

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Active Memory: timeoutMs unenforceable due to multi-turn agent loop; need single-shot mode [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround we're shipping

Code Example

Active Memory: structural latency from multi-turn agent loop makes timeoutMs effectively unenforceable

Summary

Environment

Reproduction (a single Telegram DM to the agent)

Root cause (read from source)

Why this matters

Requested change

Workaround we're shipping

Notes

Logs available

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING