openclaw - ✅(Solved) Fix [Bug]: Active Memory `timeoutMs` not respected — failover chain blocks replies for 40-125 seconds [1 pull requests, 1 participants]

sbmilburn · 2026-04-11T15:21:09Z

[openclaw] The Active Memory plugin's config.timeoutMs setting set to 15000ms is not enforced as a hard cap on total execution time. When the initial LLM reque… The Active Memory plugin's `config.timeoutMs` setting (set to 15000ms) is not enforced as a hard cap on total execution time. When the initial LLM request times out, the embedded run failover chain (rotate auth profile → fallback model) adds additional retry attempts that are **not bounded by `timeoutMs`**. This results in Active Memory blocking the main reply for 40-125+ seconds instead of the configured 15 seconds. This is a **blocking issue** — Active Memory runs synchronously before every reply in eligible sessions, so each timeout directly delays the user-visible response. # PR #65046: fix(active-memory): stop caller timeouts from continuing failover - Repository: openclaw/openclaw - Author: Takhoffman - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/65046 ## Description (problem / solution / changelog) ## Summary - stop embedded active-memory runs from treating caller-owned aborts as retry/failover candidates - ignore late active-memory payloads that arrive after the plugin timeout has already fired - add focused regression coverage for external-abort failover decisions and late timeout payloads ## Testing - pnpm vitest run src/agents/pi-embedded-runner/run/failover-policy.test.ts extensions/active-memory/index.test.ts Fixes #64867 ## Changed files - `extensions/active-memory/index.test.ts` (modified, +32/-0) - `extensions/active-memory/index.ts` (modified, +12/-0) - `extensions/codex/src/app-server/event-projector.ts` (modified, +1/-0) - `src/agents/harness/selection.test.ts` (modified, +1/-0) - `src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.ts` (modified, +1/-0) - `src/agents/pi-embedded-runner/run.overflow-compaction.fixture.ts` (modified, +1/-0) - `src/agents/pi-embedded-runner/run.ts` (modified, +5/-0) - `src/agents/pi-embedded-runner/run/assistant-failover.ts` (modified, +3/-1) - `src/agents/pi-embedded-runner/run/attempt.ts` (modified, +3/-0) - `src/agents/pi-embedded-runner/run/failover-policy.test.ts` (modified, +41/-0) - `src/agents/pi-embedded-runner/run/failover-policy.ts` (modified, +14/-0) - `src/agents/pi-embedded-runner/run/types.ts` (modified, +2/-0) - `src/agents/pi-embedded-runner/usage-reporting.test.ts` (modified, +1/-0) - `src/agents/test-helpers/pi-embedded-runner-e2e-fixtures.ts` (modified, +1/-0) ## Workaround Disabled Active Memory (`config.enabled: false`) until the timeout enforcement is fixed. The feature works well when the LLM responds promptly (1-2 seconds), but the unbounded failover chain makes it unusable in production. ### Bug type Behavior bug (incorrect output/state without crash) ### Beta release blocker No ### Summary The Active Memory plugin's `config.timeoutMs` setting (set to 15000ms) is not enforced as a hard cap on total execution time. When the initial LLM request times out, the embedded run failover chain (rotate auth profile → fallback model) adds additional retry attempts that are **not bounded by `timeoutMs`**. This results in Active Memory blocking the main reply for 40-125+ seconds instead of the configured 15 seconds. This is a **blocking issue** — Active Memory runs synchronously before every reply in eligible sessions, so each timeout directly delays the user-visible response. ### Steps to reproduce ## Environment - **OpenClaw version:** 2026.4.10 (44e5b62) - **Platform:** Linux (Ubuntu 24.04), x64, Node v25.8.2 - **Channel:** WhatsApp (direct messages) - **Auth:** Anthropic OAuth token (two profiles configured) - **Memory backend:** QMD - **Context engine:** lossless-claw 0.8.0 ## Configuration ```json { "plugins": { "entries": { "active-memory": { "enabled": true, "config": { "enabled": true, "agents": ["main"], "allowedChatTypes": ["direct"], "modelFallbackPolicy": "default-remote", "queryMode": "recent", "promptStyle": "balanced", "timeoutMs": 15000, "maxSummaryChars": 220, "persistTranscripts": false, "logging": true, "model": "anthropic/claude-haiku-4-5" } } } } } ``` ### Expected behavior `config.timeoutMs: 15000` should be a **hard ceiling on total Active Memory execution time**, including any failover/retry attempts. If the initial request doesn't complete within 15 seconds, Active Memory should return `NONE`/empty and let the main reply proceed. ### Actual behavior ## Observed Behavior Out of 14 Active Memory invocations in a ~40 minute window, **4 timed out** with durations far exceeding the configured 15s timeout: | # | Time | Model | Duration | Status | Failover Chain | |---|------|-------|----------|--------|----------------| | 1 | 07:28:59 | claude-opus-4-6* | 45.4s | timeout | rotate_profile → fallback_model | | 2 | 07:33:01 | claude-haiku-4-5 | 125.2s | timeout | rotate_profile → fallback_model | | 3 | 07:42:07 | claude-haiku-4-5 | 41.6s | timeout | rotate_profile → fallback_model | | 4 | 07:57:29 | claud

openclaw2026-04-11 15:21:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#64867•Fetched 2026-04-12 13:26:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sbmilburn

Participants

sbmilburn

Assignees

Takhoffman

Timeline (top)

labeled ×2assigned ×1closed ×1cross-referenced ×1

The Active Memory plugin's config.timeoutMs setting (set to 15000ms) is not enforced as a hard cap on total execution time. When the initial LLM request times out, the embedded run failover chain (rotate auth profile → fallback model) adds additional retry attempts that are not bounded by timeoutMs. This results in Active Memory blocking the main reply for 40-125+ seconds instead of the configured 15 seconds.

This is a blocking issue — Active Memory runs synchronously before every reply in eligible sessions, so each timeout directly delays the user-visible response.

Error Message

Log Evidence

Timeout #2 (125.2 seconds, Haiku)

07:33:01 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1391 07:33:49 [agent] embedded run failover decision: runId=active-memory-mnufpo6t-6e996d78 stage=assistant decision=rotate_profile reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:154a23a3efe6 07:34:58 [agent] embedded run failover decision: runId=active-memory-mnufpo6t-6e996d78 stage=assistant decision=fallback_model reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:ab919eb4413a 07:34:58 [diagnostic] lane task error: lane=session:agent:main:main:active-memory:735c698d8eb1 durationMs=117120 error="FailoverError: LLM request timed out." 07:35:06 [plugins] active-memory: agent=main session=agent:main:main done status=timeout elapsedMs=125243 summaryChars=0

Timeout #4 (123.8 seconds, Haiku)

07:57:29 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1666 07:58:17 [agent] embedded run failover decision: runId=active-memory-mnugl5b0-1a2d5b36 stage=assistant decision=rotate_profile reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:154a23a3efe6 07:59:33 [agent] embedded run failover decision: runId=active-memory-mnugl5b0-1a2d5b36 stage=assistant decision=fallback_model reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:ab919eb4413a 07:59:33 [diagnostic] lane task error: lane=session:agent:main:main:active-memory:f3cb0a1f008b durationMs=123816 error="FailoverError: LLM request timed out." 07:59:33 [plugins] active-memory: agent=main session=agent:main:main done status=timeout elapsedMs=123821 summaryChars=0

Successful run immediately after timeout #4 (same session, 1.9 seconds)

08:01:40 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1666 08:01:42 [plugins] active-memory: agent=main session=agent:main:main done status=ok elapsedMs=1913 summaryChars=204

Root Cause

The failover chain operates independently of timeoutMs:

Initial LLM request starts
Request times out (per some internal timeout, not timeoutMs)
Failover step 1: rotate_profile — tries the same model with a different auth profile
That also times out
Failover step 2: fallback_model — tries a different model entirely
That also times out (or the overall lane errors out)
Active Memory finally returns status=timeout

Each step in the failover chain appears to get its own timeout window (~30-60s each), and timeoutMs does not abort the chain early. The timeoutMs value appears to only apply to the initial request, not the total embedded run lifecycle.

Fix Action

Workaround

Disabled Active Memory (config.enabled: false) until the timeout enforcement is fixed. The feature works well when the LLM responds promptly (1-2 seconds), but the unbounded failover chain makes it unusable in production.

PR fix notes

PR #65046: fix(active-memory): stop caller timeouts from continuing failover

Repository: openclaw/openclaw
Author: Takhoffman
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/65046

Description (problem / solution / changelog)

Summary

stop embedded active-memory runs from treating caller-owned aborts as retry/failover candidates
ignore late active-memory payloads that arrive after the plugin timeout has already fired
add focused regression coverage for external-abort failover decisions and late timeout payloads

Testing

pnpm vitest run src/agents/pi-embedded-runner/run/failover-policy.test.ts extensions/active-memory/index.test.ts

Fixes #64867

Changed files

extensions/active-memory/index.test.ts (modified, +32/-0)
extensions/active-memory/index.ts (modified, +12/-0)
extensions/codex/src/app-server/event-projector.ts (modified, +1/-0)
src/agents/harness/selection.test.ts (modified, +1/-0)
src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.ts (modified, +1/-0)
src/agents/pi-embedded-runner/run.overflow-compaction.fixture.ts (modified, +1/-0)
src/agents/pi-embedded-runner/run.ts (modified, +5/-0)
src/agents/pi-embedded-runner/run/assistant-failover.ts (modified, +3/-1)
src/agents/pi-embedded-runner/run/attempt.ts (modified, +3/-0)
src/agents/pi-embedded-runner/run/failover-policy.test.ts (modified, +41/-0)
src/agents/pi-embedded-runner/run/failover-policy.ts (modified, +14/-0)
src/agents/pi-embedded-runner/run/types.ts (modified, +2/-0)
src/agents/pi-embedded-runner/usage-reporting.test.ts (modified, +1/-0)
src/agents/test-helpers/pi-embedded-runner-e2e-fixtures.ts (modified, +1/-0)

Code Example

{
  "plugins": {
    "entries": {
      "active-memory": {
        "enabled": true,
        "config": {
          "enabled": true,
          "agents": ["main"],
          "allowedChatTypes": ["direct"],
          "modelFallbackPolicy": "default-remote",
          "queryMode": "recent",
          "promptStyle": "balanced",
          "timeoutMs": 15000,
          "maxSummaryChars": 220,
          "persistTranscripts": false,
          "logging": true,
          "model": "anthropic/claude-haiku-4-5"
        }
      }
    }
  }
}

---

## Log Evidence

### Timeout #2 (125.2 seconds, Haiku)


07:33:01 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1391
07:33:49 [agent] embedded run failover decision: runId=active-memory-mnufpo6t-6e996d78 stage=assistant decision=rotate_profile reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:154a23a3efe6
07:34:58 [agent] embedded run failover decision: runId=active-memory-mnufpo6t-6e996d78 stage=assistant decision=fallback_model reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:ab919eb4413a
07:34:58 [diagnostic] lane task error: lane=session:agent:main:main:active-memory:735c698d8eb1 durationMs=117120 error="FailoverError: LLM request timed out."
07:35:06 [plugins] active-memory: agent=main session=agent:main:main done status=timeout elapsedMs=125243 summaryChars=0


### Timeout #4 (123.8 seconds, Haiku)


07:57:29 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1666
07:58:17 [agent] embedded run failover decision: runId=active-memory-mnugl5b0-1a2d5b36 stage=assistant decision=rotate_profile reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:154a23a3efe6
07:59:33 [agent] embedded run failover decision: runId=active-memory-mnugl5b0-1a2d5b36 stage=assistant decision=fallback_model reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:ab919eb4413a
07:59:33 [diagnostic] lane task error: lane=session:agent:main:main:active-memory:f3cb0a1f008b durationMs=123816 error="FailoverError: LLM request timed out."
07:59:33 [plugins] active-memory: agent=main session=agent:main:main done status=timeout elapsedMs=123821 summaryChars=0


### Successful run immediately after timeout #4 (same session, 1.9 seconds)


08:01:40 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1666
08:01:42 [plugins] active-memory: agent=main session=agent:main:main done status=ok elapsedMs=1913 summaryChars=204

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

This is a blocking issue — Active Memory runs synchronously before every reply in eligible sessions, so each timeout directly delays the user-visible response.

Steps to reproduce

Environment

OpenClaw version: 2026.4.10 (44e5b62)
Platform: Linux (Ubuntu 24.04), x64, Node v25.8.2
Channel: WhatsApp (direct messages)
Auth: Anthropic OAuth token (two profiles configured)
Memory backend: QMD
Context engine: lossless-claw 0.8.0

Configuration

{
  "plugins": {
    "entries": {
      "active-memory": {
        "enabled": true,
        "config": {
          "enabled": true,
          "agents": ["main"],
          "allowedChatTypes": ["direct"],
          "modelFallbackPolicy": "default-remote",
          "queryMode": "recent",
          "promptStyle": "balanced",
          "timeoutMs": 15000,
          "maxSummaryChars": 220,
          "persistTranscripts": false,
          "logging": true,
          "model": "anthropic/claude-haiku-4-5"
        }
      }
    }
  }
}

Expected behavior

config.timeoutMs: 15000 should be a hard ceiling on total Active Memory execution time, including any failover/retry attempts. If the initial request doesn't complete within 15 seconds, Active Memory should return NONE/empty and let the main reply proceed.

Actual behavior

Observed Behavior

Out of 14 Active Memory invocations in a ~40 minute window, 4 timed out with durations far exceeding the configured 15s timeout:

#	Time	Model	Duration	Status	Failover Chain
1	07:28:59	claude-opus-4-6*	45.4s	timeout	rotate_profile → fallback_model
2	07:33:01	claude-haiku-4-5	125.2s	timeout	rotate_profile → fallback_model
3	07:42:07	claude-haiku-4-5	41.6s	timeout	rotate_profile → fallback_model
4	07:57:29	claude-haiku-4-5	123.8s	timeout	rotate_profile → fallback_model

*Run #1 used Opus because the config.model patch hadn't restarted yet.

The remaining 10 invocations completed successfully in 1.2-2.2 seconds with status=empty or status=ok.

OpenClaw version

2026.4.10

Operating system

Ubuntu 24.04

Install method

npm global

Model

anthropic/opus-4-6

Provider / routing chain

Openclaw -> anthropic -> opus

Additional provider/model setup details

Root Cause Analysis

The failover chain operates independently of timeoutMs:

Initial LLM request starts
Request times out (per some internal timeout, not timeoutMs)
Failover step 1: rotate_profile — tries the same model with a different auth profile
That also times out
Failover step 2: fallback_model — tries a different model entirely
That also times out (or the overall lane errors out)
Active Memory finally returns status=timeout

Logs, screenshots, and evidence

## Log Evidence

### Timeout #2 (125.2 seconds, Haiku)


07:33:01 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1391
07:33:49 [agent] embedded run failover decision: runId=active-memory-mnufpo6t-6e996d78 stage=assistant decision=rotate_profile reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:154a23a3efe6
07:34:58 [agent] embedded run failover decision: runId=active-memory-mnufpo6t-6e996d78 stage=assistant decision=fallback_model reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:ab919eb4413a
07:34:58 [diagnostic] lane task error: lane=session:agent:main:main:active-memory:735c698d8eb1 durationMs=117120 error="FailoverError: LLM request timed out."
07:35:06 [plugins] active-memory: agent=main session=agent:main:main done status=timeout elapsedMs=125243 summaryChars=0


### Timeout #4 (123.8 seconds, Haiku)


07:57:29 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1666
07:58:17 [agent] embedded run failover decision: runId=active-memory-mnugl5b0-1a2d5b36 stage=assistant decision=rotate_profile reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:154a23a3efe6
07:59:33 [agent] embedded run failover decision: runId=active-memory-mnugl5b0-1a2d5b36 stage=assistant decision=fallback_model reason=timeout provider=anthropic/claude-haiku-4-5 profile=sha256:ab919eb4413a
07:59:33 [diagnostic] lane task error: lane=session:agent:main:main:active-memory:f3cb0a1f008b durationMs=123816 error="FailoverError: LLM request timed out."
07:59:33 [plugins] active-memory: agent=main session=agent:main:main done status=timeout elapsedMs=123821 summaryChars=0


### Successful run immediately after timeout #4 (same session, 1.9 seconds)


08:01:40 [plugins] active-memory: agent=main session=agent:main:main start timeoutMs=15000 queryChars=1666
08:01:42 [plugins] active-memory: agent=main session=agent:main:main done status=ok elapsedMs=1913 summaryChars=204

Impact and severity

Every timeout blocks the user-visible reply by 40-125 seconds
Active Memory is a blocking pre-reply step — there is no async fallback
The pattern is intermittent (4/14 = ~29% failure rate) but severe when it hits
WhatsApp disconnects (status 408) correlate with the long blocking periods, suggesting the connection idles out while waiting

Additional information

Suggested Fix

Hard timeout enforcement: timeoutMs should wrap the entire embedded run lifecycle (initial request + all failover attempts). If the total elapsed time exceeds timeoutMs, abort immediately and return empty/NONE.
Or: disable failover for Active Memory runs. The Active Memory plugin is latency-sensitive by design. The failover chain (rotate profile → fallback model) is appropriate for main replies but counterproductive for a blocking pre-reply sub-agent where speed matters more than resilience.
Or: add a separate maxTotalTimeoutMs config that caps the entire chain, independent of per-attempt timeouts.

Workaround

Additional Context

Two Anthropic auth profiles are configured (OAuth tokens). The failover chain rotates between them before falling back to a different model.
The main session LLM requests (Opus) are working fine during the same time window — this appears specific to the embedded Active Memory sub-agent runner.
modelFallbackPolicy: "default-remote" was set per the docs' recommended setup. Switching to "resolved-only" might reduce the chain length but wouldn't fix the core issue of timeoutMs not being enforced.

extent analysis

TL;DR

Implement a hard timeout enforcement for the entire embedded run lifecycle of the Active Memory plugin, ensuring that the timeoutMs setting is respected across all failover attempts.

Guidance

Review the Active Memory plugin's configuration to understand how timeoutMs is currently being applied and identify areas where the failover chain is not respecting this timeout.
Consider disabling the failover chain for Active Memory runs or implementing a separate maxTotalTimeoutMs config to cap the entire chain, as suggested in the issue.
Verify that the main session LLM requests are working as expected and that the issue is specific to the embedded Active Memory sub-agent runner.
Test the workaround of disabling Active Memory until the timeout enforcement is fixed to ensure that the feature works well when the LLM responds promptly.

Example

No code snippet is provided as the issue does not imply a specific code change, but rather a configuration or design adjustment to the Active Memory plugin.

Notes

The issue highlights the importance of respecting the timeoutMs setting across all failover attempts in the Active Memory plugin to prevent blocking issues. The suggested fixes and workarounds aim to address this specific problem.

Recommendation

Apply workaround: Disable Active Memory (config.enabled: false) until the timeout enforcement is fixed, as this will prevent the blocking issues caused by the unbounded failover chain. This is a temporary solution until a more permanent fix can be implemented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#vector store #embedding generation #cache error #pipeline error #runtime error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Active Memory `timeoutMs` not respected — failover chain blocks replies for 40-125 seconds [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Log Evidence

Timeout #2 (125.2 seconds, Haiku)

Timeout #4 (123.8 seconds, Haiku)

Successful run immediately after timeout #4 (same session, 1.9 seconds)

Root Cause

Fix Action

Workaround

PR fix notes

PR #65046: fix(active-memory): stop caller timeouts from continuing failover

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Environment

Configuration

Expected behavior

Actual behavior

Observed Behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Root Cause Analysis

Logs, screenshots, and evidence

Impact and severity

Additional information

Suggested Fix

Workaround

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING