openclaw - 💡(How to fix) Fix Reliability: active-memory blocks replies and QMD boot initialization can overload multi-agent gateways [1 participants]

openclaw2026-04-26 06:40:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72015•Fetched 2026-04-27 05:36:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

0xCNAI

Participants

0xCNAI

Timeline (top)

cross-referenced ×1subscribed ×1

On a multi-agent OpenClaw gateway, enabling the official active-memory plugin can make normal replies slow or unreliable. The main issue is that active-memory currently runs a full embedded agent/model call inside before_prompt_build, using the active conversation model by default, and waits for it before the actual user reply is built.

In the same environment, QMD memory startup initialization arms memory managers for all configured agents on gateway boot. Each manager can start its own boot update. This is useful, but when many agents are configured it can create a burst of QMD work at startup. Combined with active-memory running per user message, the gateway can experience high CPU, long response latency, and timeout cascades.

This is not a transcript duplication issue. It is a separate reliability/defaults concern.

Error Message

When active-memory was enabled, every eligible interactive user message triggered lines like:

Root Cause

Each QMD manager can then run boot update because qmd.update.onBoot is true. In qmd-manager-LLKxprVD.js, initialize(...) starts:

Fix Action

Fix / Workaround

Workaround used locally

Code Example

active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 start timeoutMs=30000 queryChars=1505
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=timeout elapsedMs=40808 summaryChars=0
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=timeout elapsedMs=63548 summaryChars=0
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=empty elapsedMs=29864 summaryChars=0

---

qmd memory startup initialization armed for 10 agents: "tino", "jonathan", "betalpha-social", "jccat", "analyst", "news", "reporter", "social", "travel-pm", "family-pm"

---

api.on("before_prompt_build", async (event, ctx) => {
  ...
  const result = await maybeResolveActiveRecall(...);
  if (!result.summary) return;
  return { prependContext: promptPrefix };
});

---

params.api.runtime.agent.runEmbeddedPiAgent({
  provider: modelRef.provider,
  model: modelRef.model,
  timeoutMs: params.config.timeoutMs,
  toolsAllow: ["memory_search", "memory_get"],
  bootstrapContextMode: "lightweight",
  silentExpected: true,
  ...
});

---

const agentIds = listAgentIds(params.cfg);
for (const agentId of agentIds) {
  if (!resolveMemorySearchConfig(params.cfg, agentId)) continue;
  const resolved = resolveActiveMemoryBackendConfig({ cfg: params.cfg, agentId });
  if (resolved.backend !== "qmd") continue;
  await getActiveMemorySearchManager({ cfg: params.cfg, agentId });
  armedAgentIds.push(agentId);
}

---

if (this.qmd.update.onBoot) {
  const bootRun = this.runUpdate("boot", true);
  ...
}

---

{
  "plugins": {
    "entries": {
      "active-memory": {
        "enabled": true,
        "config": {
          "enabled": true,
          "mode": "nonblocking",
          "timeoutMs": 2000,
          "model": "<fast-cheap-default>",
          "queryMode": "message",
          "recentUserTurns": 1,
          "recentAssistantTurns": 0,
          "cacheTtlMs": 60000
        }
      }
    }
  },
  "memory": {
    "qmd": {
      "update": {
        "onBoot": "activeOnly",
        "embedInterval": "60m",
        "maxConcurrentBootUpdates": 1
      }
    }
  }
}

RAW_BUFFERClick to expand / collapse

Reliability suggestion: active-memory blocks replies and QMD boot initialization can overload gateway in 2026.4.24

Summary

This is not a transcript duplication issue. It is a separate reliability/defaults concern.

Environment

OpenClaw version: 2026.4.24
Platform: macOS ARM64
Runtime: Gateway with multiple agents
Active model observed: openai-codex/gpt-5.5
QMD backend enabled for memory search
memory.qmd.update.onBoot = true
memory.qmd.update.embedInterval = 30m
memory.qmd.limits.timeoutMs = 40000
active-memory.timeoutMs = 30000
active-memory.queryMode = recent
active-memory.maxSummaryChars = 220

Observed behavior

When active-memory was enabled, every eligible interactive user message triggered lines like:

active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 start timeoutMs=30000 queryChars=1505
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=timeout elapsedMs=40808 summaryChars=0
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=timeout elapsedMs=63548 summaryChars=0
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=empty elapsedMs=29864 summaryChars=0

So a 30s active-memory timeout produced 30–60s of extra latency and often returned no usable memory (summaryChars=0).

Separately, on gateway boot/restart, logs repeatedly showed:

qmd memory startup initialization armed for 10 agents: "tino", "jonathan", "betalpha-social", "jccat", "analyst", "news", "reporter", "social", "travel-pm", "family-pm"

This means boot initialization is not limited to the currently active agent/session. It initializes QMD memory for every configured agent with memory search enabled.

Current implementation notes

active-memory blocks prompt construction

extensions/active-memory/index.js registers:

api.on("before_prompt_build", async (event, ctx) => {
  ...
  const result = await maybeResolveActiveRecall(...);
  if (!result.summary) return;
  return { prependContext: promptPrefix };
});

maybeResolveActiveRecall(...) calls runRecallSubagent(...).

runRecallSubagent(...) uses:

params.api.runtime.agent.runEmbeddedPiAgent({
  provider: modelRef.provider,
  model: modelRef.model,
  timeoutMs: params.config.timeoutMs,
  toolsAllow: ["memory_search", "memory_get"],
  bootstrapContextMode: "lightweight",
  silentExpected: true,
  ...
});

The model defaults to the current run model / agent primary model unless active-memory.config.model is explicitly set. In this environment that meant openai-codex/gpt-5.5 was used as a per-message memory recall subagent.

QMD onBoot initializes all agents

server-startup-memory-kT6lKCrb.js does:

const agentIds = listAgentIds(params.cfg);
for (const agentId of agentIds) {
  if (!resolveMemorySearchConfig(params.cfg, agentId)) continue;
  const resolved = resolveActiveMemoryBackendConfig({ cfg: params.cfg, agentId });
  if (resolved.backend !== "qmd") continue;
  await getActiveMemorySearchManager({ cfg: params.cfg, agentId });
  armedAgentIds.push(agentId);
}

Each QMD manager can then run boot update because qmd.update.onBoot is true. In qmd-manager-LLKxprVD.js, initialize(...) starts:

if (this.qmd.update.onBoot) {
  const bootRun = this.runUpdate("boot", true);
  ...
}

QMD update queueing is per qmdDir, so different agents can still start separate boot updates. The embed lock is global, but the update phase can still create a startup burst across agents.

Impact

User replies are delayed before the actual model run begins.
A failed/empty memory lookup can add tens of seconds while providing no useful context.
Gateway CPU can spike during boot or restart when many agents arm QMD memory managers.
Active-memory and QMD startup work can overlap, creating timeout cascades.
Operators may think normal chat or model latency is broken, when the delay is pre-prompt memory recall.

Why this is surprising

The feature name suggests a lightweight memory retrieval layer, but the default behavior is closer to: run another full embedded LLM turn before each eligible user reply, using the same active model unless configured otherwise.

That may be powerful, but it is unsafe as a default for slow/expensive models or high-traffic agents.

Suggested fixes for next release

1. Make active-memory fail-open and non-blocking by default

Do not block the actual user reply on active-memory unless explicitly configured.

Possible modes:

mode: "nonblocking" default: start recall opportunistically; only inject if it returns very quickly.
mode: "blocking" opt-in: current behavior for operators who want maximum recall.
deadlineMs: hard budget for pre-prompt recall, default maybe 1000–3000ms.

If recall misses the deadline, skip injection and let the user reply proceed.

2. Use a cheap/fast recall model by default, not the current conversation model

If no active-memory.config.model is set, default to a lightweight model profile rather than ctx.modelProviderId/ctx.modelId or the agent primary model.

At minimum, warn loudly when active-memory inherits a slow/high-cost model.

3. Enforce hard timeout cancellation

Observed elapsed time exceeded configured timeoutMs substantially (30000 configured, 40808 / 63548 observed). The abort signal may not stop the embedded run promptly.

The active-memory timeout should be a hard wall-clock budget for the pre-prompt hook.

4. Add concurrency limits for active-memory recall

Per agent/session limits would prevent multiple simultaneous recall subagents from stacking during active chat bursts.

Suggested defaults:

one active-memory recall per agent
one active-memory recall per session
drop or reuse cached result when a recall is already running

5. Add QMD boot concurrency control / startup jitter across agents

qmd memory startup initialization should avoid boot-time bursts across all configured agents.

Possible approaches:

global max concurrent QMD boot updates, default 1
jitter per agent
lazy-initialize QMD manager on first memory search instead of arming every agent on boot
separate onBootAgents allowlist or onBoot: "activeOnly"

6. Make the operational cost visible in status/doctor

openclaw status / doctor could flag:

active-memory is enabled and inherits a slow primary model
active-memory timeout is high
QMD onBoot is enabled for many agents
active-memory is returning mostly timeout/empty results

Recommended safer default profile

For multi-agent gateways, a safer profile might be:

{
  "plugins": {
    "entries": {
      "active-memory": {
        "enabled": true,
        "config": {
          "enabled": true,
          "mode": "nonblocking",
          "timeoutMs": 2000,
          "model": "<fast-cheap-default>",
          "queryMode": "message",
          "recentUserTurns": 1,
          "recentAssistantTurns": 0,
          "cacheTtlMs": 60000
        }
      }
    }
  },
  "memory": {
    "qmd": {
      "update": {
        "onBoot": "activeOnly",
        "embedInterval": "60m",
        "maxConcurrentBootUpdates": 1
      }
    }
  }
}

Exact schema names can differ; the point is safer semantics.

Workaround used locally

Active-memory was disabled and memory-core dreaming was kept disabled. With those disabled, gateway CPU and reply latency returned to a stable baseline, while manual memory_search / QMD queries still worked.

This suggests the issue is not QMD search itself, but the combination of blocking per-message active-memory embedded LLM recall and broad QMD boot/update work.

extent analysis

TL;DR

To mitigate the reliability issues caused by active-memory and QMD boot initialization, consider disabling active-memory or configuring it to run in non-blocking mode with a shorter timeout.

Guidance

Review the active-memory configuration and consider setting mode to "nonblocking" to prevent it from blocking user replies.
Adjust the timeoutMs value for active-memory to a lower value, such as 2000ms, to reduce latency.
Evaluate the QMD boot initialization process and consider implementing concurrency limits or startup jitter to prevent CPU spikes.
Monitor the performance of active-memory and QMD using openclaw status or doctor to identify potential issues.

Example

A safer default profile for active-memory could be:

{
  "active-memory": {
    "enabled": true,
    "config": {
      "mode": "nonblocking",
      "timeoutMs": 2000,
      "model": "<fast-cheap-default>"
    }
  }
}

Notes

The provided workaround of disabling active-memory and keeping memory-core dreaming disabled may not be suitable for all environments, as it may impact the functionality of memory search and QMD queries.

Recommendation

Apply a workaround by configuring active-memory to run in non-blocking mode with a shorter timeout, as this is a safer and more efficient approach to mitigate the reliability issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix Reliability: active-memory blocks replies and QMD boot initialization can overload multi-agent gateways [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround used locally

Code Example

Reliability suggestion: active-memory blocks replies and QMD boot initialization can overload gateway in 2026.4.24

Summary

Environment

Observed behavior

Current implementation notes

active-memory blocks prompt construction

QMD onBoot initializes all agents

Impact

Why this is surprising

Suggested fixes for next release

1. Make active-memory fail-open and non-blocking by default

2. Use a cheap/fast recall model by default, not the current conversation model

3. Enforce hard timeout cancellation

4. Add concurrency limits for active-memory recall

5. Add QMD boot concurrency control / startup jitter across agents

6. Make the operational cost visible in status/doctor

Recommended safer default profile

Workaround used locally

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING