openclaw - 💡(How to fix) Fix Active Memory: add single-shot mode (no embedded agent loop) for low-latency preflight injection [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72359Fetched 2026-04-27 05:31:00
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

Request a new plugins.entries.active-memory.config.mode: "single-shot" (or similar) that bypasses runEmbeddedPiAgent and produces the memory summary in one LLM round-trip + one qmd query, instead of an N-turn agent loop.

This is the second of the two fixes proposed in #72347. That issue addressed the timeoutMs enforceability bug; this one is the latency-architecture sibling.

Root Cause

Request a new plugins.entries.active-memory.config.mode: "single-shot" (or similar) that bypasses runEmbeddedPiAgent and produces the memory summary in one LLM round-trip + one qmd query, instead of an N-turn agent loop.

This is the second of the two fixes proposed in #72347. That issue addressed the timeoutMs enforceability bug; this one is the latency-architecture sibling.

Code Example

// pseudocode for AM single-shot path
const queryText = deriveQuery(message, config.queryMode);              // existing
const hits = await qmd.search(queryText, config.qmd);                  // ONE search
const prompt = buildSingleShotPrompt(message, hits, config.promptStyle);
const reply = await llm.complete({                                     // ONE API call
  model: config.model,
  prompt,
  maxTokens: chars2tokens(config.maxSummaryChars),
  abortSignal: budgetSignal,
});
return parseNoneOrSummary(reply);
RAW_BUFFERClick to expand / collapse

Summary

Request a new plugins.entries.active-memory.config.mode: "single-shot" (or similar) that bypasses runEmbeddedPiAgent and produces the memory summary in one LLM round-trip + one qmd query, instead of an N-turn agent loop.

This is the second of the two fixes proposed in #72347. That issue addressed the timeoutMs enforceability bug; this one is the latency-architecture sibling.

Proposal

Roughly:

// pseudocode for AM single-shot path
const queryText = deriveQuery(message, config.queryMode);              // existing
const hits = await qmd.search(queryText, config.qmd);                  // ONE search
const prompt = buildSingleShotPrompt(message, hits, config.promptStyle);
const reply = await llm.complete({                                     // ONE API call
  model: config.model,
  prompt,
  maxTokens: chars2tokens(config.maxSummaryChars),
  abortSignal: budgetSignal,
});
return parseNoneOrSummary(reply);

No tools, no agent loop, no runEmbeddedPiAgent. Wall-clock ≈ 1 API round-trip + 1 qmd search.

Why this is needed in addition to the hard timeout from #72347

A hard timeout caps the worst case but doesn't change the mean. Empirically, the AM tool loop runs 6–12s end-to-end on fast models (gpt-5.4-mini direct API benchmarks at ~1.8s per call; multiply by 3–4 turns). To make AM useful as a preflight (not just a brake), the architecture itself needs to be lighter for promptStyle: "preference-only"-style use cases.

Scope (intentionally narrow)

  • In scope: new opt-in mode config value, single API call, no tool use, returns NONE or a ≤maxSummaryChars summary.
  • Out of scope: changing the existing multi-turn mode (which makes sense for recall-heavy / contextual styles), or auto-detecting which mode to use.

Repro showing why current architecture is too slow

(Same setup as #72347 — single Telegram DM to the agent, AM enabled. Shipped logs show 12–22s elapsed across 4 models, including local ollama and direct OpenAI API. Direct API benchmark of gpt-5.4-mini: 1.82s. Bottleneck is the loop, not the model. Full table in #72347.)

Note

Filing as a separate ticket per the suggestion in the closing comment of #72347. Single-purpose, hopefully easier to triage and not at risk of being auto-closed by the same bot pattern that mis-closed the parent.

extent analysis

TL;DR

Implement a new plugins.entries.active-memory.config.mode with a "single-shot" value to reduce latency by bypassing the runEmbeddedPiAgent loop.

Guidance

  • Introduce a new opt-in mode config value, "single-shot", to enable a single API call and no tool usage.
  • Modify the code to use a single LLM round-trip and one qmd query, as shown in the provided pseudocode.
  • Ensure the new mode returns NONE or a summary with a maximum length of maxSummaryChars characters.
  • Test the new mode with promptStyle: "preference-only"-style use cases to verify the latency improvement.

Example

const queryText = deriveQuery(message, config.queryMode);
const hits = await qmd.search(queryText, config.qmd);
const prompt = buildSingleShotPrompt(message, hits, config.promptStyle);
const reply = await llm.complete({
  model: config.model,
  prompt,
  maxTokens: chars2tokens(config.maxSummaryChars),
  abortSignal: budgetSignal,
});
return parseNoneOrSummary(reply);

Notes

The new "single-shot" mode is intended to coexist with the existing multi-turn mode, which will remain unchanged. The scope of this change is intentionally narrow, focusing on a specific use case.

Recommendation

Apply the workaround by implementing the new "single-shot" mode, as it addresses the latency issue without modifying the existing architecture. This change is expected to reduce the mean latency, making the Active Memory feature more suitable for preflight use cases.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING