openclaw - ✅(Solved) Fix [Bug]: active-memory timeoutMs does not behave as a hard deadline for embedded run [1 pull requests, 4 comments, 3 participants]

CvaHA · 2026-04-25T14:59:25Z

[openclaw] In OpenClaw 2026.4.23, active-memory loads timeoutMs=5000 but returns timeout after ~11–13s in both Telegram and Web, even with short messages and w… In OpenClaw 2026.4.23, active-memory loads timeoutMs=5000 but returns timeout after ~11–13s in both Telegram and Web, even with short messages and working memory_search/Ollama. # PR #71687: fix(active-memory): enforce timeoutMs as hard deadline via Promise.race [AI-assisted] - Repository: openclaw/openclaw - Author: luyao618 - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/71687 ## Description (problem / solution / changelog) > 🤖 AI-assisted (built with Claude Code via Hermes orchestration, reviewed by Codex). Test level: fully tested. Prompt summary available on request. ## Summary - Problem: `maybeResolveActiveRecall` in the active-memory extension awaits `runRecallSubagent()` cooperatively — the `AbortController` fires at `timeoutMs` but the function blocks until the embedded agent run checks the signal at internal checkpoints. With `timeoutMs=5000`, real-world wall-clock time reaches 11–13s. - Why it matters: Interactive conversations become noticeably slower; the configured timeout does not guarantee quick fail-open behavior. - What changed: Wrapped `runRecallSubagent()` with `Promise.race` against a timeout promise tied to the existing `AbortController`. When the timeout fires first, the function returns `{ status: "timeout" }` immediately. Late subagent rejections are caught silently to prevent unhandled promise errors. - What did NOT change (scope boundary): The cooperative `AbortController` still fires to request graceful cleanup of the embedded run. No changes to `runRecallSubagent`, `runEmbeddedPiAgent`, or any core agent runtime code. ## Change Type (select all) - [x] Bug fix ## Scope (select all touched areas) - [x] Integrations ## Linked Issue/PR - Closes #71629 - [x] This PR fixes a bug or regression ## Root Cause - Root cause: `maybeResolveActiveRecall` used a cooperative abort pattern — `AbortController.abort()` sets `signal.aborted = true`, but the `await runRecallSubagent(...)` call blocks until the embedded run's internal retry/failover loop reaches a checkpoint that checks the signal. Between checkpoints (during LLM API calls, tool execution), the abort is ignored. - Missing detection / guardrail: No `Promise.race` wrapper to enforce the configured timeout as a hard wall-clock deadline independent of internal abort-signal propagation. - Contributing context: The embedded agent runner (`runEmbeddedPiAgent`) has a complex retry loop with multiple cooperative abort checkpoints, but the active-memory caller had no mechanism to return early when those checkpoints are slow to reach. ## Regression Test Plan - Coverage level that should have caught this: - [x] Unit test - Target test or file: `extensions/active-memory/index.test.ts` - Scenario the test should lock in: A subagent that blocks for 30s without checking the abort signal must still cause `maybeResolveActiveRecall` to return within `timeoutMs + 500ms` margin with `status=timeout`. - Why this is the smallest reliable guardrail: Tests the exact boundary between cooperative and preemptive timeout enforcement at the caller level. - Existing test that already covers this (if any): The existing "ignores late subagent payloads once the active-memory timeout signal has fired" test covers the cooperative path but does not assert wall-clock return time. - If no new test is added, why not: N/A — new test added. ## User-visible / Behavior Changes - `active-memory` now returns `status=timeout` within the configured `timeoutMs` (plus small scheduling jitter), instead of potentially blocking for 2–3× longer. ## Diagram (if applicable) ```text Before: [setTimeout fires at timeoutMs] -> [signal.aborted = true] -> [await blocks until embedded run checks signal] -> [return timeout after 11-13s] After: [setTimeout fires at timeoutMs] -> [Promise.race resolves TIMEOUT_SENTINEL] -> [return timeout immediately] -> [embedded run continues cleanup in background] ``` ## Security Impact (required) - New permissions/capabilities? No - Secrets/tokens handling changed? No - New/changed network calls? No - Command/tool execution surface changed? No - Data access scope changed? No ## Repro + Verification ### Environment - OS: macOS (Darwin, Apple Silicon) - Runtime: Node 22+, pnpm - Model/provider: N/A (unit test uses mocked embedded agent) ### Steps 1. Configure `active-memory` with `timeoutMs: 200` 2. Mock `runEmbeddedPiAgent` to block for 30s without checking abort signal 3. Call `before_prompt_build` and measure wall-clock return time ### Expected - Function returns within ~700ms (200ms timeout + 500ms margin) ### Actual - Before fix: blocks for 30s (or until embedded run cooperatively checks signal) - After fix: returns within ~205ms with `status=timeout` ## Evidence - [x] Failing test/log before + passing after - 64/64 extension tests pass - `oxfmt --check` clean ## Cha

openclaw2026-04-25 14:59:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#71629•Fetched 2026-04-26 05:10:23

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4labeled ×2referenced ×2closed ×1

In OpenClaw 2026.4.23, active-memory loads timeoutMs=5000 but returns timeout after ~11–13s in both Telegram and Web, even with short messages and working memory_search/Ollama.

Error Message

No Ollama error

Root Cause

This does not appear to be caused by empty memory, broken Ollama, or credentials. External memory_search works and returns results in about 812ms. The issue appears to be that timeoutMs reaches active-memory and the embedded run, but cancellation behaves cooperatively rather than as a hard deadline.

Fix Action

Fix / Workaround

I am avoiding local patches and would prefer an upstream-supported fix, because a naive Promise.race around the recall could return early but might leave embedded runs, locks, or lanes in an inconsistent state.

PR fix notes

PR #71687: fix(active-memory): enforce timeoutMs as hard deadline via Promise.race [AI-assisted]

Repository: openclaw/openclaw
Author: luyao618
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/71687

Description (problem / solution / changelog)

🤖 AI-assisted (built with Claude Code via Hermes orchestration, reviewed by Codex). Test level: fully tested. Prompt summary available on request.

Summary

Problem: maybeResolveActiveRecall in the active-memory extension awaits runRecallSubagent() cooperatively — the AbortController fires at timeoutMs but the function blocks until the embedded agent run checks the signal at internal checkpoints. With timeoutMs=5000, real-world wall-clock time reaches 11–13s.
Why it matters: Interactive conversations become noticeably slower; the configured timeout does not guarantee quick fail-open behavior.
What changed: Wrapped runRecallSubagent() with Promise.race against a timeout promise tied to the existing AbortController. When the timeout fires first, the function returns { status: "timeout" } immediately. Late subagent rejections are caught silently to prevent unhandled promise errors.
What did NOT change (scope boundary): The cooperative AbortController still fires to request graceful cleanup of the embedded run. No changes to runRecallSubagent, runEmbeddedPiAgent, or any core agent runtime code.

Change Type (select all)

Bug fix

Scope (select all touched areas)

Integrations

Linked Issue/PR

Closes #71629
This PR fixes a bug or regression

Root Cause

Root cause: maybeResolveActiveRecall used a cooperative abort pattern — AbortController.abort() sets signal.aborted = true, but the await runRecallSubagent(...) call blocks until the embedded run's internal retry/failover loop reaches a checkpoint that checks the signal. Between checkpoints (during LLM API calls, tool execution), the abort is ignored.
Missing detection / guardrail: No Promise.race wrapper to enforce the configured timeout as a hard wall-clock deadline independent of internal abort-signal propagation.
Contributing context: The embedded agent runner (runEmbeddedPiAgent) has a complex retry loop with multiple cooperative abort checkpoints, but the active-memory caller had no mechanism to return early when those checkpoints are slow to reach.

Regression Test Plan

Coverage level that should have caught this:
- Unit test
Target test or file: extensions/active-memory/index.test.ts
Scenario the test should lock in: A subagent that blocks for 30s without checking the abort signal must still cause maybeResolveActiveRecall to return within timeoutMs + 500ms margin with status=timeout.
Why this is the smallest reliable guardrail: Tests the exact boundary between cooperative and preemptive timeout enforcement at the caller level.
Existing test that already covers this (if any): The existing "ignores late subagent payloads once the active-memory timeout signal has fired" test covers the cooperative path but does not assert wall-clock return time.
If no new test is added, why not: N/A — new test added.

User-visible / Behavior Changes

active-memory now returns status=timeout within the configured timeoutMs (plus small scheduling jitter), instead of potentially blocking for 2–3× longer.

Diagram (if applicable)

Before:
[setTimeout fires at timeoutMs] -> [signal.aborted = true] -> [await blocks until embedded run checks signal] -> [return timeout after 11-13s]

After:
[setTimeout fires at timeoutMs] -> [Promise.race resolves TIMEOUT_SENTINEL] -> [return timeout immediately] -> [embedded run continues cleanup in background]

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

OS: macOS (Darwin, Apple Silicon)
Runtime: Node 22+, pnpm
Model/provider: N/A (unit test uses mocked embedded agent)

Steps

Configure active-memory with timeoutMs: 200
Mock runEmbeddedPiAgent to block for 30s without checking abort signal
Call before_prompt_build and measure wall-clock return time

Expected

Function returns within ~700ms (200ms timeout + 500ms margin)

Actual

Before fix: blocks for 30s (or until embedded run cooperatively checks signal)
After fix: returns within ~205ms with status=timeout

Evidence

Failing test/log before + passing after
64/64 extension tests pass
oxfmt --check clean

Changed files

extensions/active-memory/index.test.ts (modified, +38/-0)
extensions/active-memory/index.ts (modified, +40/-1)

Code Example

Relevant config, redacted:

{
  "main": "openai-codex/gpt-5.5",
  "active-memory": {
    "enabled": true,
    "config": {
      "agents": ["main"],
      "allowedChatTypes": ["direct", "group"],
      "model": "openai-codex/gpt-5.4",
      "queryMode": "message",
      "promptStyle": "strict",
      "thinking": "off",
      "timeoutMs": 5000,
      "maxSummaryChars": 120,
      "logging": true,
      "persistTranscripts": false
    }
  },
  "memorySearch": {
    "provider": "ollama",
    "model": "all-minilm:latest"
  }
}

Memory backend status:

~/.openclaw/memory/main.sqlite exists
files=13
chunks=81
meta=1
external memory_search: OK
approximate latency: 812ms
returned results: 8
effective backend: ollama / all-minilm:latest

Telegram real short message:

Message:
Prueba corta memoria

Logs:
active-memory start activeProvider=openai-codex activeModel=gpt-5.4 timeoutMs=5000 queryChars=403
embedded run failover decision reason=timeout from=openai-codex/gpt-5.4
active-memory done status=timeout elapsedMs=11841 summaryChars=0

Web real short message:

Message:
Prueba corta memoria web

Logs:
active-memory start activeProvider=openai-codex activeModel=gpt-5.4 timeoutMs=5000 queryChars=163
embedded run failover decision reason=timeout from=openai-codex/gpt-5.4
active-memory done status=timeout elapsedMs=13141 summaryChars=0

Errors not present during these tests:

No CredentialsProviderError
No Ollama error
No visible terminated message in final user-facing output

Earlier additional signals:

stuck session: sessionKey=agent:main:telegram:direct:<redacted> state=processing
lane wait exceeded: lane=session:agent:main:telegram:direct:<redacted>

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

In OpenClaw 2026.4.23, active-memory loads timeoutMs=5000 but returns timeout after ~11–13s in both Telegram and Web, even with short messages and working memory_search/Ollama.

Steps to reproduce

Run OpenClaw 2026.4.23 with active-memory enabled for agent main.
Configure active-memory with timeoutMs=5000, queryMode=message, promptStyle=strict, thinking=off, and model=openai-codex/gpt-5.4.
Configure memorySearch with Ollama/all-minilm.
Send a short real Telegram message: "Prueba corta memoria".
Observe active-memory logs.
Send a short real Web message: "Prueba corta memoria web".
Observe active-memory logs.

Expected behavior

With active-memory timeoutMs=5000, active-memory should fail open close to the configured timeout, ideally around 5000–7000ms, and allow the main response to continue without blocking the Telegram/Web lane for ~11–13s.

Actual behavior

active-memory starts with timeoutMs=5000 but returns status=timeout after 11841ms in Telegram and 13141ms in Web. In both cases summaryChars=0 and embedded run failover reports reason=timeout.

OpenClaw version

2026.4.23

Operating system

Ubuntu 25.10, Linux 6.17.0-22-generic

Install method

npm global

Model

Main: openai-codex/gpt-5.5 active-memory config.model: openai-codex/gpt-5.4 memorySearch embeddings: ollama/all-minilm:latest

Provider / routing chain

OpenClaw -> openai-codex OAuth for main and active-memory embedded run; OpenClaw -> local Ollama for memorySearch embeddings.

Additional provider/model setup details

openai-codex OAuth is configured for the main agent. The main model is openai-codex/gpt-5.5. active-memory is explicitly configured with model=openai-codex/gpt-5.4. memorySearch uses local Ollama with all-minilm:latest. No API keys, tokens, or passwords included.

Logs, screenshots, and evidence

Relevant config, redacted:

{
  "main": "openai-codex/gpt-5.5",
  "active-memory": {
    "enabled": true,
    "config": {
      "agents": ["main"],
      "allowedChatTypes": ["direct", "group"],
      "model": "openai-codex/gpt-5.4",
      "queryMode": "message",
      "promptStyle": "strict",
      "thinking": "off",
      "timeoutMs": 5000,
      "maxSummaryChars": 120,
      "logging": true,
      "persistTranscripts": false
    }
  },
  "memorySearch": {
    "provider": "ollama",
    "model": "all-minilm:latest"
  }
}

Memory backend status:

~/.openclaw/memory/main.sqlite exists
files=13
chunks=81
meta=1
external memory_search: OK
approximate latency: 812ms
returned results: 8
effective backend: ollama / all-minilm:latest

Telegram real short message:

Message:
Prueba corta memoria

Logs:
active-memory start activeProvider=openai-codex activeModel=gpt-5.4 timeoutMs=5000 queryChars=403
embedded run failover decision reason=timeout from=openai-codex/gpt-5.4
active-memory done status=timeout elapsedMs=11841 summaryChars=0

Web real short message:

Message:
Prueba corta memoria web

Logs:
active-memory start activeProvider=openai-codex activeModel=gpt-5.4 timeoutMs=5000 queryChars=163
embedded run failover decision reason=timeout from=openai-codex/gpt-5.4
active-memory done status=timeout elapsedMs=13141 summaryChars=0

Errors not present during these tests:

No CredentialsProviderError
No Ollama error
No visible terminated message in final user-facing output

Earlier additional signals:

stuck session: sessionKey=agent:main:telegram:direct:<redacted> state=processing
lane wait exceeded: lane=session:agent:main:telegram:direct:<redacted>

Impact and severity

Affected channels: Telegram and Web.

Severity: medium-high for interactive use. The agent still responds, but active-memory adds ~11–13s delay per tested message and may contribute to lane/session delays.

Frequency: reproduced in both tested channels with short real messages.

Consequence: timeoutMs=5000 does not guarantee quick fail-open behavior, and interactive conversations become noticeably slower.

Additional information

extent analysis

TL;DR

The active-memory timeout is not being respected, causing a delay of around 11-13 seconds in both Telegram and Web channels.

Guidance

Review the active-memory configuration to ensure that the timeoutMs value is being correctly applied to the embedded run.
Investigate the possibility of a cooperative cancellation issue, where the embedded run is not being terminated immediately when the timeout is reached.
Check the OpenClaw version and consider upgrading to a newer version if available, as this issue may have been addressed in a later release.
Consider adding additional logging to the active-memory module to gain more insight into the timing and behavior of the embedded run.

Example

No code example is provided, as the issue appears to be related to the configuration and behavior of the active-memory module rather than a specific code snippet.

Notes

The issue appears to be specific to the active-memory module and its interaction with the embedded run. The fact that the timeoutMs value is being reached but not respected suggests a potential issue with the cancellation mechanism.

Recommendation

Apply a workaround by modifying the active-memory configuration to use a more aggressive timeout value or by implementing a custom cancellation mechanism to ensure that the embedded run is terminated immediately when the timeout is reached.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #agent setup #task chaining #parallel task #integration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: active-memory timeoutMs does not behave as a hard deadline for embedded run [1 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #71687: fix(active-memory): enforce timeoutMs as hard deadline via Promise.race [AI-assisted]

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause

Regression Test Plan

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING