openclaw - 💡(How to fix) Fix AM embedded run aborts memory_search tool calls; classifies as timeout despite model completion [4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74586Fetched 2026-04-30 06:22:38
View on GitHub
Comments
4
Participants
4
Timeline
5
Reactions
2
Timeline (top)
commented ×4subscribed ×1

Active Memory's embedded recall sub-agent has four correlated findings that together break AM functionality on a healthy gateway:

  1. AM produced status=timeout with summaryChars=0 across 5 sequential trigger runs spanning 2 different models and 2 different timeout budgets. Pass 5 enabled persistTranscripts: true; the persisted transcript shows every memory_search call from inside the embedded run returned "Aborted" (isError: true). Passes 1-4 produce the same status-line pattern but did not capture transcripts.
  2. The same memory_search call from a non-embedded session works correctly (1.656s, returns hits/empty cleanly, no abort) — isolating the bug to embedded-runner / AM tool bridge / abort propagation, not memory-core globally.
  3. The embedded runner classifies the run as status=timeout even when the assistant returns a normal NONE response in ~18s, well under the configured 60s budget.
  4. Prior-run prompt-error events leak into the next run's context as injected custom:openclaw:prompt-error events.

Error Message

19:36:18.009Z session start 19:36:18.851Z model_change → ollama-cloud/minimax-m2.5 19:36:19.091Z prompt-error injected: "active-memory timeout after 60000ms | active-memory timeout after 60000ms" (bug 4 — stale prompt-error from prior pass leaks into this run) 19:36:19.107Z user message (system prompt + recent context + trigger) 19:36:28.246Z assistant: 9s reasoning, calls memory_search({query: "...", maxResults: 3}) 19:36:28.250Z toolResult: {content: [{type:"text",text:"Aborted"}], isError: true} 19:36:29.939Z assistant retries with simpler query → toolResult: "Aborted" 19:36:31.670Z assistant retries again → toolResult: "Aborted" 19:36:36.262Z assistant: returns "NONE" (correct fallback per system prompt instructions) 19:36:36.519Z embedded run: failover decision, reason=timeout, decision=surface_error (bug 3 — model returned NONE in 18s; classified timeout anyway)

Root Cause

AM is functionally unusable on this deployment until the embedded-runner abort-propagation issue is fixed. All cloud-recall-model tuning and budget tuning are dead-end paths because the model never gets useful tool results regardless of model speed. Documented in 5 sequential test passes with controlled variable changes.

Fix Action

Fix / Workaround

The bug is in the embedded runner's tool dispatch / abort propagation path, NOT in memory-core or the recall model. Specifically:

  1. Investigate why memory_search tool dispatch aborts inside embedded runs. Compare embedded vs direct tool dispatch path.
  2. Embedded-run timeout classification should respect model completion. If the model returns within budget (whether useful content OR NONE), status=ok is the correct classification.
  3. Clear prior-run prompt-error events from the next-run context. Cross-run state leakage of error metadata is unexpected.

Code Example

19:36:18.009Z  session start
19:36:18.851Z  model_change → ollama-cloud/minimax-m2.5
19:36:19.091Z  prompt-error injected: "active-memory timeout after 60000ms | active-memory timeout after 60000ms"
                (bug 4 — stale prompt-error from prior pass leaks into this run)
19:36:19.107Z  user message (system prompt + recent context + trigger)
19:36:28.246Z  assistant: 9s reasoning, calls memory_search({query: "...", maxResults: 3})
19:36:28.250Z  toolResult: {content: [{type:"text",text:"Aborted"}], isError: true}
19:36:29.939Z  assistant retries with simpler query → toolResult: "Aborted"
19:36:31.670Z  assistant retries again → toolResult: "Aborted"
19:36:36.262Z  assistant: returns "NONE" (correct fallback per system prompt instructions)
19:36:36.519Z  embedded run: failover decision, reason=timeout, decision=surface_error
                (bug 3 — model returned NONE in 18s; classified timeout anyway)

---

{
  "plugins.entries.active-memory": {
    "enabled": true,
    "config": {
      "enabled": true,
      "agents": ["trent"],
      "allowedChatTypes": ["direct"],
      "queryMode": "recent",
      "promptStyle": "balanced",
      "timeoutMs": 60000,
      "maxSummaryChars": 220,
      "persistTranscripts": true,
      "logging": true,
      "model": "ollama-cloud/minimax-m2.5",
      "modelFallback": "ollama-cloud/minimax-m2.5"
    }
  }
}
RAW_BUFFERClick to expand / collapse

AM embedded run aborts memory_search tool calls; classifies as timeout despite model completion

Version

OpenClaw 2026.4.26 (commit be8c246), running on .npm-global/lib/node_modules/openclaw/openclaw.mjs per governance/UPGRADE-SOP-NATIVE.md.

Plugin

active-memory (stock plugin, dist/extensions/active-memory/index.js).

Summary

Active Memory's embedded recall sub-agent has four correlated findings that together break AM functionality on a healthy gateway:

  1. AM produced status=timeout with summaryChars=0 across 5 sequential trigger runs spanning 2 different models and 2 different timeout budgets. Pass 5 enabled persistTranscripts: true; the persisted transcript shows every memory_search call from inside the embedded run returned "Aborted" (isError: true). Passes 1-4 produce the same status-line pattern but did not capture transcripts.
  2. The same memory_search call from a non-embedded session works correctly (1.656s, returns hits/empty cleanly, no abort) — isolating the bug to embedded-runner / AM tool bridge / abort propagation, not memory-core globally.
  3. The embedded runner classifies the run as status=timeout even when the assistant returns a normal NONE response in ~18s, well under the configured 60s budget.
  4. Prior-run prompt-error events leak into the next run's context as injected custom:openclaw:prompt-error events.

Reproduction

Trent (the AM-eligible agent on this deployment) is on openai-codex/gpt-5.5 session. AM config tested across 5 runs:

PassmodelmodelFallbacktimeoutMspersistTranscriptsactiveModel observedelapsedMsstatussummaryCharsmemory_search results
1(unset)(unset)15000falseopenai-codex/gpt-5.541991timeout0not visible
2(unset)ollama-cloud/minimax-m2.515000falseopenai-codex/gpt-5.546790timeout0not visible
3ollama-cloud/minimax-m2.5ollama-cloud/minimax-m2.515000falseollama-cloud/minimax-m2.545676timeout0not visible
4ollama-cloud/minimax-m2.5ollama-cloud/minimax-m2.560000falseollama-cloud/minimax-m2.597994timeout0not visible
5ollama-cloud/minimax-m2.5ollama-cloud/minimax-m2.560000trueollama-cloud/minimax-m2.594199 (plugin status); assistant returned NONE in ~18s (transcript)timeout0all "Aborted"

Pass 5's persisted transcript at ~/.openclaw/plugins/active-memory/transcripts/agents/trent/active-memory/active-memory-mokgeyo1-780d30df.jsonl shows:

19:36:18.009Z  session start
19:36:18.851Z  model_change → ollama-cloud/minimax-m2.5
19:36:19.091Z  prompt-error injected: "active-memory timeout after 60000ms | active-memory timeout after 60000ms"
                (bug 4 — stale prompt-error from prior pass leaks into this run)
19:36:19.107Z  user message (system prompt + recent context + trigger)
19:36:28.246Z  assistant: 9s reasoning, calls memory_search({query: "...", maxResults: 3})
19:36:28.250Z  toolResult: {content: [{type:"text",text:"Aborted"}], isError: true}
19:36:29.939Z  assistant retries with simpler query → toolResult: "Aborted"
19:36:31.670Z  assistant retries again → toolResult: "Aborted"
19:36:36.262Z  assistant: returns "NONE" (correct fallback per system prompt instructions)
19:36:36.519Z  embedded run: failover decision, reason=timeout, decision=surface_error
                (bug 3 — model returned NONE in 18s; classified timeout anyway)

Disambiguation test

From a non-embedded session on the same host/runtime: memory_search invoked directly completed in 1656ms, returned 0 hits, no abort. The same memory tool path is reachable outside the AM embedded run; only the AM embedded path aborts.

Conclusion

The bug is in the embedded runner's tool dispatch / abort propagation path, NOT in memory-core or the recall model. Specifically:

  • Whatever signals "abort" to the tool result is firing for every memory_search call within the embedded sub-agent
  • The model eventually handles it gracefully by returning NONE (note: it violates the prompt's one-tool-stop instruction by retrying 3x first, but it converges to a valid answer)
  • But the embedded runner declares the run a timeout even when the model returned cleanly within budget

System config (relevant excerpts)

{
  "plugins.entries.active-memory": {
    "enabled": true,
    "config": {
      "enabled": true,
      "agents": ["trent"],
      "allowedChatTypes": ["direct"],
      "queryMode": "recent",
      "promptStyle": "balanced",
      "timeoutMs": 60000,
      "maxSummaryChars": 220,
      "persistTranscripts": true,
      "logging": true,
      "model": "ollama-cloud/minimax-m2.5",
      "modelFallback": "ollama-cloud/minimax-m2.5"
    }
  }
}

tools.allow = [], tools.alsoAllow = []. Provider ollama-cloud configured with minimax-m2.5 allowlisted. Direct memory tool path on this host is responsive (per disambiguation test above) — failure scope is the AM embedded run.

Source pointers

  • dist/extensions/active-memory/index.js — around getModelRef candidates list (function definition at line 858, candidates array at lines 865-870 in this build) — chain-resolution model selection. (Related but separate doc/UX bug — see related filing.)
  • dist/extensions/active-memory/index.js — around the runEmbeddedPiAgent invocation (in this build, transcript path wiring runs roughly lines 893-955).
  • The Aborted toolResult appears to be coming from the embedded runner's abort controller. We did not narrow further — likely needs source-side investigation around the AM tool bridge / sub-agent setup path.

Suggested remediation areas (not authoritative — investigation needed)

  1. Investigate why memory_search tool dispatch aborts inside embedded runs. Compare embedded vs direct tool dispatch path.
  2. Embedded-run timeout classification should respect model completion. If the model returns within budget (whether useful content OR NONE), status=ok is the correct classification.
  3. Clear prior-run prompt-error events from the next-run context. Cross-run state leakage of error metadata is unexpected.

Impact

AM is functionally unusable on this deployment until the embedded-runner abort-propagation issue is fixed. All cloud-recall-model tuning and budget tuning are dead-end paths because the model never gets useful tool results regardless of model speed. Documented in 5 sequential test passes with controlled variable changes.

Reporter

islandpreneur007, with diagnostic support from Trent.

extent analysis

TL;DR

The most likely fix involves investigating and resolving the issue with the embedded runner's tool dispatch and abort propagation path, ensuring that memory_search calls within the embedded sub-agent do not abort and that the embedded runner correctly classifies run status based on model completion.

Guidance

  1. Investigate the embedded runner's tool dispatch path: Compare the tool dispatch path for embedded runs versus direct tool calls to identify why memory_search calls are being aborted within the embedded sub-agent.
  2. Review timeout classification logic: Ensure that the embedded runner's timeout classification respects model completion, classifying the run as status=ok if the model returns within the budget, regardless of whether it returns useful content or NONE.
  3. Clear prior-run error events: Implement a mechanism to clear prior-run prompt-error events from the next-run context to prevent cross-run state leakage of error metadata.
  4. Verify the fix: After implementing the fixes, run the same test passes as described in the issue to verify that the memory_search calls no longer abort and that the embedded runner correctly classifies run status.

Example

No specific code example can be provided without further investigation into the source code. However, the fix will likely involve modifying the runEmbeddedPiAgent invocation and the tool dispatch path in dist/extensions/active-memory/index.js.

Notes

The issue seems to be specific to the embedded runner's interaction with the memory_search tool, and the fix will require a detailed understanding of the source code. The suggested remediation areas provide a starting point for the investigation.

Recommendation

Apply a workaround by modifying the embedded runner's tool dispatch and timeout classification logic to prevent aborts and correctly classify run status. This will likely involve source code changes and may require additional debugging to fully resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix AM embedded run aborts memory_search tool calls; classifies as timeout despite model completion [4 comments, 4 participants]