openclaw - ✅(Solved) Fix active-memory plugin: timeoutMs=30000 ignored — abort signal doesn't reach HTTP fetch, runs 65s [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72965Fetched 2026-04-28 06:29:23
View on GitHub
Comments
2
Participants
3
Timeline
9
Reactions
1
Timeline (top)
referenced ×6commented ×2cross-referenced ×1

Error Message

controller.abort(new Error(active-memory timeout after ${params.config.timeoutMs}ms)); const abortErr = new Error("Operation aborted", { cause: reason });

Root Cause

Looking at the active-memory plugin code (extensions/active-memory/index.js, maybeResolveActiveRecall):

const controller = new AbortController();
const timeoutId = setTimeout(() => {
    controller.abort(new Error(`active-memory timeout after ${params.config.timeoutMs}ms`));
}, params.config.timeoutMs);
timeoutId.unref?.();

const timeoutPromise = new Promise((resolve) => {
    controller.signal.addEventListener("abort", () => {
        resolve(TIMEOUT_SENTINEL);
    }, { once: true });
});

const subagentPromise = runRecallSubagent({
    ...params,
    modelRef: resolvedModelRef,
    abortSignal: controller.signal
});
subagentPromise.catch(() => void 0);
const raceResult = await Promise.race([subagentPromise, timeoutPromise]);

The timeout mechanism is correctly structured:

  1. setTimeout fires at 30s
  2. controller.abort() is called
  3. The abort event resolves timeoutPromise with TIMEOUT_SENTINEL
  4. Promise.race should resolve at 30s

However, the evidence shows elapsedMs=65250, meaning the code path taking the raceResult === TIMEOUT_SENTINEL branch at 30s is NOT being executed.

The abortSignal is passed down through runRecallSubagentparams.api.runtime.agent.runEmbeddedPiAgentpi-embedded-xwfWu_QR.js. The embedded agent passes the signal at line 1891, but the abort check at line 1419 returns early if the signal is NOT already aborted:

// pi-embedded-xwfWu_QR.js line 1419
if (!params.abortSignal?.aborted) return;
const reason = params.abortSignal.reason;
const abortErr = new Error("Operation aborted", { cause: reason });
abortErr.name = "AbortError";
throw abortErr;

This is a polling-style abort check — it only checks abortSignal.aborted at specific points in the run lifecycle, NOT during an active model API call. If the model API call (cpam/claude-opus-4-6-thinking) takes 60+ seconds, the abort is never checked during that window.

The result:

  1. controller.abort() fires at 30s
  2. The abort signal is marked as aborted
  3. But the embedded agent is mid-API-call and never polls the signal
  4. The model keeps running until its own provider timeout or the embedded framework's failover logic kicks in at ~65s
  5. The active-memory plugin reports elapsedMs=65250

Fix Action

Fixed

PR fix notes

PR #73120: fix: cancel in-flight Anthropic SSE read when abort signal fires (#72…

Description (problem / solution / changelog)

Summary

  • Problem: active-memory timeout fired after 30 s, but startup blocked for 97+ seconds. The parseAnthropicSseBody SSE reader called reader.read() with no abort awareness, so the in-flight HTTP connection was never cancelled.
  • Why it matters: The orphaned subagent kept its embedded-runner concurrency queue slot until the provider's own timeout fired (~65 s), stalling anything queued behind it during startup.
  • What changed: Added readChunkOrAbort helper that races reader.read() against the abort signal; on abort it calls reader.cancel() and rejects immediately. Passed options?.signal into parseAnthropicSseBody. Fixed elapsedMs in the timeout result to use params.config.timeoutMs directly.
  • What did NOT change (scope boundary): Queue logic, retry/failover, timeout scheduling, and all other transport providers are untouched.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #72965
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: parseAnthropicSseBody iterated response.body via a bare reader.read() loop. The AbortSignal was passed to fetch() and client.messages.stream(), but nothing raced reader.read() against it. Once the body stream started, reader.read() blocked until the next chunk regardless of abort state.
  • Missing detection / guardrail: Existing active-memory abort tests mocked runEmbeddedPiAgent at a higher level, bypassing the actual SSE read path entirely.
  • Contributing context (if known): Promise.race in maybeResolveActiveRecall correctly returned the timeout sentinel at 30 s, but the orphaned subagent kept its queue slot until reader.read() eventually unblocked.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/anthropic-transport-stream.test.ts
  • Scenario the test should lock in: A ReadableStream sends one SSE event then stalls 5 s; abort fires at 50 ms; asserts wall-clock time is under 1 s.
  • Why this is the smallest reliable guardrail: Directly exercises the readChunkOrAbort seam without a live provider or embedded runner.
  • Existing test that already covers this (if any): None at the transport layer.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

active-memory recall subagents that hit the timeout now release their queue slot immediately. Startup latency is bounded to timeoutMs, not timeoutMs + provider-response-time. Timeout log lines now report elapsedMs equal to the configured value.

Diagram (if applicable)

Before:
[abort at T=30s] -> [Promise.race unblocks] -> [reader.read() blocks until T=65s+] -> [queue slot held]

After:
[abort at T=30s] -> [Promise.race unblocks] -> [readChunkOrAbort rejects] -> [reader.cancel(); queue slot released]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Node 22+
  • Model/provider: Any Anthropic-compatible provider
  • Integration/channel (if any): active-memory plugin enabled
  • Relevant config (redacted): active-memory.timeoutMs = 30000

Steps

  1. Enable active-memory with timeoutMs: 30000
  2. Trigger a session where the provider response takes > 30 s
  3. Observe startup time

Expected

  • Unblocks within ~30 s

Actual

  • Blocked for 97+ s

Evidence

  • Failing test/log before + passing after — new transport test fails on main, passes with this change; 72 tests pass across both touched files

Human Verification (required)

  • Verified scenarios: pnpm check exits 0; pnpm test src/agents/anthropic-transport-stream.test.ts extensions/active-memory/index.test.ts — 10 transport + 62 active-memory tests pass
  • Edge cases checked: signal already aborted before readChunkOrAbort is called; reader.read() errors independently; both branches racing to settle
  • What you did not verify: live provider round-trip with a real slow Anthropic response

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: reader.cancel() while reader.read() is in-flight may behave differently across Node.js versions.
    • Mitigation: settled flag ensures only one branch resolves the promise; reader.cancel() errors are swallowed so they cannot surface as unhandled rejections.

Changed files

  • extensions/active-memory/index.ts (modified, +1/-1)
  • src/agents/anthropic-transport-stream.test.ts (modified, +40/-0)
  • src/agents/anthropic-transport-stream.ts (modified, +51/-2)

Code Example

01:02:11 [plugins] active-memory: agent=main session=agent:main:main
  activeProvider=cpam activeModel=claude-opus-4-6-thinking
  start timeoutMs=30000 queryChars=2396

01:03:16 [plugins] active-memory: agent=main session=agent:main:main
  activeProvider=cpam activeModel=claude-opus-4-6-thinking
  done status=timeout elapsedMs=65250 summaryChars=0

01:03:48 [agent/embedded] embedded run failover decision:
  runId=active-memory-mohg34ok-db863fa9 stage=assistant
  decision=surface_error reason=timeout
  from=cpam/claude-opus-4-6-thinking profile=-

---

const controller = new AbortController();
const timeoutId = setTimeout(() => {
    controller.abort(new Error(`active-memory timeout after ${params.config.timeoutMs}ms`));
}, params.config.timeoutMs);
timeoutId.unref?.();

const timeoutPromise = new Promise((resolve) => {
    controller.signal.addEventListener("abort", () => {
        resolve(TIMEOUT_SENTINEL);
    }, { once: true });
});

const subagentPromise = runRecallSubagent({
    ...params,
    modelRef: resolvedModelRef,
    abortSignal: controller.signal
});
subagentPromise.catch(() => void 0);
const raceResult = await Promise.race([subagentPromise, timeoutPromise]);

---

// pi-embedded-xwfWu_QR.js line 1419
if (!params.abortSignal?.aborted) return;
const reason = params.abortSignal.reason;
const abortErr = new Error("Operation aborted", { cause: reason });
abortErr.name = "AbortError";
throw abortErr;
RAW_BUFFERClick to expand / collapse

Environment

  • OpenClaw version: 2026.4.25 (aa36ee6)
  • Node.js: v22.22.2
  • OS: Debian 12, Linux 6.1.0-34-amd64

Problem

The active-memory plugin has timeoutMs: 30000 configured, but the actual elapsed time was 65250ms — over double the configured limit. The configured timeout is completely ineffective.

Evidence

From /tmp/openclaw/openclaw-2026-04-28.log:

01:02:11 [plugins] active-memory: agent=main session=agent:main:main
  activeProvider=cpam activeModel=claude-opus-4-6-thinking
  start timeoutMs=30000 queryChars=2396

01:03:16 [plugins] active-memory: agent=main session=agent:main:main
  activeProvider=cpam activeModel=claude-opus-4-6-thinking
  done status=timeout elapsedMs=65250 summaryChars=0

01:03:48 [agent/embedded] embedded run failover decision:
  runId=active-memory-mohg34ok-db863fa9 stage=assistant
  decision=surface_error reason=timeout
  from=cpam/claude-opus-4-6-thinking profile=-

Three observations:

  1. timeoutMs=30000 is correctly logged at start
  2. elapsedMs=65250 — 35 seconds past the configured timeout
  3. A separate embedded agent failover fires 32 seconds AFTER the plugin's own timeout report

Root Cause Analysis

Looking at the active-memory plugin code (extensions/active-memory/index.js, maybeResolveActiveRecall):

const controller = new AbortController();
const timeoutId = setTimeout(() => {
    controller.abort(new Error(`active-memory timeout after ${params.config.timeoutMs}ms`));
}, params.config.timeoutMs);
timeoutId.unref?.();

const timeoutPromise = new Promise((resolve) => {
    controller.signal.addEventListener("abort", () => {
        resolve(TIMEOUT_SENTINEL);
    }, { once: true });
});

const subagentPromise = runRecallSubagent({
    ...params,
    modelRef: resolvedModelRef,
    abortSignal: controller.signal
});
subagentPromise.catch(() => void 0);
const raceResult = await Promise.race([subagentPromise, timeoutPromise]);

The timeout mechanism is correctly structured:

  1. setTimeout fires at 30s
  2. controller.abort() is called
  3. The abort event resolves timeoutPromise with TIMEOUT_SENTINEL
  4. Promise.race should resolve at 30s

However, the evidence shows elapsedMs=65250, meaning the code path taking the raceResult === TIMEOUT_SENTINEL branch at 30s is NOT being executed.

The abortSignal is passed down through runRecallSubagentparams.api.runtime.agent.runEmbeddedPiAgentpi-embedded-xwfWu_QR.js. The embedded agent passes the signal at line 1891, but the abort check at line 1419 returns early if the signal is NOT already aborted:

// pi-embedded-xwfWu_QR.js line 1419
if (!params.abortSignal?.aborted) return;
const reason = params.abortSignal.reason;
const abortErr = new Error("Operation aborted", { cause: reason });
abortErr.name = "AbortError";
throw abortErr;

This is a polling-style abort check — it only checks abortSignal.aborted at specific points in the run lifecycle, NOT during an active model API call. If the model API call (cpam/claude-opus-4-6-thinking) takes 60+ seconds, the abort is never checked during that window.

The result:

  1. controller.abort() fires at 30s
  2. The abort signal is marked as aborted
  3. But the embedded agent is mid-API-call and never polls the signal
  4. The model keeps running until its own provider timeout or the embedded framework's failover logic kicks in at ~65s
  5. The active-memory plugin reports elapsedMs=65250

Impact

This causes significant startup delays. During /new after restart:

  • active-memory runs for 65s instead of 30s (blocking the user's session)
  • Plus 32 more seconds for the embedded agent failover decision
  • Total: 97 seconds of unnecessary blocking

Suggested Fix

  1. Wire the abortSignal to the actual HTTP fetch — when calling the model API, pass the abortSignal to the Node.js fetch() call's signal option so the ongoing HTTP request is cancelled immediately on abort.

  2. Alternatively, enforce the timeout at the runner level — if the model provider doesn't support abort signals, wrap the agent run with a Promise.race against the timeoutMs at the runEmbeddedPiAgent level, not just at the plugin level.

  3. Fix the elapsedMs reporting — even when the embedded framework's own failover eventually stops the run, the plugin-level elapsedMs should reflect when the timeout was INTENDED to fire, not when the run actually stopped.

Related

  • #72960 — Event loop blocked for ~97s during startup (startup performance)

extent analysis

TL;DR

The most likely fix is to wire the abortSignal to the actual HTTP fetch when calling the model API to immediately cancel the ongoing request on abort.

Guidance

  • Review the runRecallSubagent function to ensure it properly handles the abortSignal and cancels the HTTP request when aborted.
  • Consider implementing a Promise.race at the runEmbeddedPiAgent level to enforce the timeout if the model provider doesn't support abort signals.
  • Update the elapsedMs reporting to reflect when the timeout was intended to fire, rather than when the run actually stopped.

Example

const response = await fetch(url, {
  signal: params.abortSignal,
  // other options
});

This example shows how to pass the abortSignal to the fetch() call to cancel the request when aborted.

Notes

The suggested fix assumes that the model API supports abort signals. If it doesn't, an alternative approach may be needed. Additionally, the elapsedMs reporting fix requires careful consideration to ensure accurate timing.

Recommendation

Apply the workaround by wiring the abortSignal to the actual HTTP fetch to immediately cancel the ongoing request on abort. This approach is more targeted and efficient than enforcing the timeout at the runner level.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix active-memory plugin: timeoutMs=30000 ignored — abort signal doesn't reach HTTP fetch, runs 65s [1 pull requests, 2 comments, 3 participants]