openclaw - ✅(Solved) Fix [Bug]: active-memory timeoutMs does not behave as a hard deadline for embedded run [1 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71629Fetched 2026-04-26 05:10:23
View on GitHub
Comments
4
Participants
3
Timeline
10
Reactions
0
Author
Timeline (top)
commented ×4labeled ×2referenced ×2closed ×1

In OpenClaw 2026.4.23, active-memory loads timeoutMs=5000 but returns timeout after ~11–13s in both Telegram and Web, even with short messages and working memory_search/Ollama.

Error Message

No Ollama error

Root Cause

This does not appear to be caused by empty memory, broken Ollama, or credentials. External memory_search works and returns results in about 812ms. The issue appears to be that timeoutMs reaches active-memory and the embedded run, but cancellation behaves cooperatively rather than as a hard deadline.

Fix Action

Fix / Workaround

I am avoiding local patches and would prefer an upstream-supported fix, because a naive Promise.race around the recall could return early but might leave embedded runs, locks, or lanes in an inconsistent state.

PR fix notes

PR #71687: fix(active-memory): enforce timeoutMs as hard deadline via Promise.race [AI-assisted]

Description (problem / solution / changelog)

🤖 AI-assisted (built with Claude Code via Hermes orchestration, reviewed by Codex). Test level: fully tested. Prompt summary available on request.

Summary

  • Problem: maybeResolveActiveRecall in the active-memory extension awaits runRecallSubagent() cooperatively — the AbortController fires at timeoutMs but the function blocks until the embedded agent run checks the signal at internal checkpoints. With timeoutMs=5000, real-world wall-clock time reaches 11–13s.
  • Why it matters: Interactive conversations become noticeably slower; the configured timeout does not guarantee quick fail-open behavior.
  • What changed: Wrapped runRecallSubagent() with Promise.race against a timeout promise tied to the existing AbortController. When the timeout fires first, the function returns { status: "timeout" } immediately. Late subagent rejections are caught silently to prevent unhandled promise errors.
  • What did NOT change (scope boundary): The cooperative AbortController still fires to request graceful cleanup of the embedded run. No changes to runRecallSubagent, runEmbeddedPiAgent, or any core agent runtime code.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Integrations

Linked Issue/PR

  • Closes #71629
  • This PR fixes a bug or regression

Root Cause

  • Root cause: maybeResolveActiveRecall used a cooperative abort pattern — AbortController.abort() sets signal.aborted = true, but the await runRecallSubagent(...) call blocks until the embedded run's internal retry/failover loop reaches a checkpoint that checks the signal. Between checkpoints (during LLM API calls, tool execution), the abort is ignored.
  • Missing detection / guardrail: No Promise.race wrapper to enforce the configured timeout as a hard wall-clock deadline independent of internal abort-signal propagation.
  • Contributing context: The embedded agent runner (runEmbeddedPiAgent) has a complex retry loop with multiple cooperative abort checkpoints, but the active-memory caller had no mechanism to return early when those checkpoints are slow to reach.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: extensions/active-memory/index.test.ts
  • Scenario the test should lock in: A subagent that blocks for 30s without checking the abort signal must still cause maybeResolveActiveRecall to return within timeoutMs + 500ms margin with status=timeout.
  • Why this is the smallest reliable guardrail: Tests the exact boundary between cooperative and preemptive timeout enforcement at the caller level.
  • Existing test that already covers this (if any): The existing "ignores late subagent payloads once the active-memory timeout signal has fired" test covers the cooperative path but does not assert wall-clock return time.
  • If no new test is added, why not: N/A — new test added.

User-visible / Behavior Changes

  • active-memory now returns status=timeout within the configured timeoutMs (plus small scheduling jitter), instead of potentially blocking for 2–3× longer.

Diagram (if applicable)

Before:
[setTimeout fires at timeoutMs] -> [signal.aborted = true] -> [await blocks until embedded run checks signal] -> [return timeout after 11-13s]

After:
[setTimeout fires at timeoutMs] -> [Promise.race resolves TIMEOUT_SENTINEL] -> [return timeout immediately] -> [embedded run continues cleanup in background]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS (Darwin, Apple Silicon)
  • Runtime: Node 22+, pnpm
  • Model/provider: N/A (unit test uses mocked embedded agent)

Steps

  1. Configure active-memory with timeoutMs: 200
  2. Mock runEmbeddedPiAgent to block for 30s without checking abort signal
  3. Call before_prompt_build and measure wall-clock return time

Expected

  • Function returns within ~700ms (200ms timeout + 500ms margin)

Actual

  • Before fix: blocks for 30s (or until embedded run cooperatively checks signal)
  • After fix: returns within ~205ms with status=timeout

Evidence

  • Failing test/log before + passing after
  • 64/64 extension tests pass
  • oxfmt --check clean

Changed files

  • extensions/active-memory/index.test.ts (modified, +38/-0)
  • extensions/active-memory/index.ts (modified, +40/-1)

Code Example

Relevant config, redacted:

{
  "main": "openai-codex/gpt-5.5",
  "active-memory": {
    "enabled": true,
    "config": {
      "agents": ["main"],
      "allowedChatTypes": ["direct", "group"],
      "model": "openai-codex/gpt-5.4",
      "queryMode": "message",
      "promptStyle": "strict",
      "thinking": "off",
      "timeoutMs": 5000,
      "maxSummaryChars": 120,
      "logging": true,
      "persistTranscripts": false
    }
  },
  "memorySearch": {
    "provider": "ollama",
    "model": "all-minilm:latest"
  }
}

Memory backend status:

~/.openclaw/memory/main.sqlite exists
files=13
chunks=81
meta=1
external memory_search: OK
approximate latency: 812ms
returned results: 8
effective backend: ollama / all-minilm:latest

Telegram real short message:

Message:
Prueba corta memoria

Logs:
active-memory start activeProvider=openai-codex activeModel=gpt-5.4 timeoutMs=5000 queryChars=403
embedded run failover decision reason=timeout from=openai-codex/gpt-5.4
active-memory done status=timeout elapsedMs=11841 summaryChars=0

Web real short message:

Message:
Prueba corta memoria web

Logs:
active-memory start activeProvider=openai-codex activeModel=gpt-5.4 timeoutMs=5000 queryChars=163
embedded run failover decision reason=timeout from=openai-codex/gpt-5.4
active-memory done status=timeout elapsedMs=13141 summaryChars=0

Errors not present during these tests:

No CredentialsProviderError
No Ollama error
No visible terminated message in final user-facing output

Earlier additional signals:

stuck session: sessionKey=agent:main:telegram:direct:<redacted> state=processing
lane wait exceeded: lane=session:agent:main:telegram:direct:<redacted>
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

In OpenClaw 2026.4.23, active-memory loads timeoutMs=5000 but returns timeout after ~11–13s in both Telegram and Web, even with short messages and working memory_search/Ollama.

Steps to reproduce

  1. Run OpenClaw 2026.4.23 with active-memory enabled for agent main.
  2. Configure active-memory with timeoutMs=5000, queryMode=message, promptStyle=strict, thinking=off, and model=openai-codex/gpt-5.4.
  3. Configure memorySearch with Ollama/all-minilm.
  4. Send a short real Telegram message: "Prueba corta memoria".
  5. Observe active-memory logs.
  6. Send a short real Web message: "Prueba corta memoria web".
  7. Observe active-memory logs.

Expected behavior

With active-memory timeoutMs=5000, active-memory should fail open close to the configured timeout, ideally around 5000–7000ms, and allow the main response to continue without blocking the Telegram/Web lane for ~11–13s.

Actual behavior

active-memory starts with timeoutMs=5000 but returns status=timeout after 11841ms in Telegram and 13141ms in Web. In both cases summaryChars=0 and embedded run failover reports reason=timeout.

OpenClaw version

2026.4.23

Operating system

Ubuntu 25.10, Linux 6.17.0-22-generic

Install method

npm global

Model

Main: openai-codex/gpt-5.5 active-memory config.model: openai-codex/gpt-5.4 memorySearch embeddings: ollama/all-minilm:latest

Provider / routing chain

OpenClaw -> openai-codex OAuth for main and active-memory embedded run; OpenClaw -> local Ollama for memorySearch embeddings.

Additional provider/model setup details

openai-codex OAuth is configured for the main agent. The main model is openai-codex/gpt-5.5. active-memory is explicitly configured with model=openai-codex/gpt-5.4. memorySearch uses local Ollama with all-minilm:latest. No API keys, tokens, or passwords included.

Logs, screenshots, and evidence

Relevant config, redacted:

{
  "main": "openai-codex/gpt-5.5",
  "active-memory": {
    "enabled": true,
    "config": {
      "agents": ["main"],
      "allowedChatTypes": ["direct", "group"],
      "model": "openai-codex/gpt-5.4",
      "queryMode": "message",
      "promptStyle": "strict",
      "thinking": "off",
      "timeoutMs": 5000,
      "maxSummaryChars": 120,
      "logging": true,
      "persistTranscripts": false
    }
  },
  "memorySearch": {
    "provider": "ollama",
    "model": "all-minilm:latest"
  }
}

Memory backend status:

~/.openclaw/memory/main.sqlite exists
files=13
chunks=81
meta=1
external memory_search: OK
approximate latency: 812ms
returned results: 8
effective backend: ollama / all-minilm:latest

Telegram real short message:

Message:
Prueba corta memoria

Logs:
active-memory start activeProvider=openai-codex activeModel=gpt-5.4 timeoutMs=5000 queryChars=403
embedded run failover decision reason=timeout from=openai-codex/gpt-5.4
active-memory done status=timeout elapsedMs=11841 summaryChars=0

Web real short message:

Message:
Prueba corta memoria web

Logs:
active-memory start activeProvider=openai-codex activeModel=gpt-5.4 timeoutMs=5000 queryChars=163
embedded run failover decision reason=timeout from=openai-codex/gpt-5.4
active-memory done status=timeout elapsedMs=13141 summaryChars=0

Errors not present during these tests:

No CredentialsProviderError
No Ollama error
No visible terminated message in final user-facing output

Earlier additional signals:

stuck session: sessionKey=agent:main:telegram:direct:<redacted> state=processing
lane wait exceeded: lane=session:agent:main:telegram:direct:<redacted>

Impact and severity

Affected channels: Telegram and Web.

Severity: medium-high for interactive use. The agent still responds, but active-memory adds ~11–13s delay per tested message and may contribute to lane/session delays.

Frequency: reproduced in both tested channels with short real messages.

Consequence: timeoutMs=5000 does not guarantee quick fail-open behavior, and interactive conversations become noticeably slower.

Additional information

This does not appear to be caused by empty memory, broken Ollama, or credentials. External memory_search works and returns results in about 812ms. The issue appears to be that timeoutMs reaches active-memory and the embedded run, but cancellation behaves cooperatively rather than as a hard deadline.

I am avoiding local patches and would prefer an upstream-supported fix, because a naive Promise.race around the recall could return early but might leave embedded runs, locks, or lanes in an inconsistent state.

extent analysis

TL;DR

The active-memory timeout is not being respected, causing a delay of around 11-13 seconds in both Telegram and Web channels.

Guidance

  • Review the active-memory configuration to ensure that the timeoutMs value is being correctly applied to the embedded run.
  • Investigate the possibility of a cooperative cancellation issue, where the embedded run is not being terminated immediately when the timeout is reached.
  • Check the OpenClaw version and consider upgrading to a newer version if available, as this issue may have been addressed in a later release.
  • Consider adding additional logging to the active-memory module to gain more insight into the timing and behavior of the embedded run.

Example

No code example is provided, as the issue appears to be related to the configuration and behavior of the active-memory module rather than a specific code snippet.

Notes

The issue appears to be specific to the active-memory module and its interaction with the embedded run. The fact that the timeoutMs value is being reached but not respected suggests a potential issue with the cancellation mechanism.

Recommendation

Apply a workaround by modifying the active-memory configuration to use a more aggressive timeout value or by implementing a custom cancellation mechanism to ensure that the embedded run is terminated immediately when the timeout is reached.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

With active-memory timeoutMs=5000, active-memory should fail open close to the configured timeout, ideally around 5000–7000ms, and allow the main response to continue without blocking the Telegram/Web lane for ~11–13s.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: active-memory timeoutMs does not behave as a hard deadline for embedded run [1 pull requests, 4 comments, 3 participants]