openclaw - 💡(How to fix) Fix Active Memory: add single-shot mode (no embedded agent loop) for low-latency preflight injection [1 participants]

openclaw2026-04-26 21:04:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72359•Fetched 2026-04-27 05:31:00

View on GitHub

Comments

Participants

Timeline

Reactions

Author

thecolormaroun

Participants

thecolormaroun

Timeline (top)

cross-referenced ×1

Request a new plugins.entries.active-memory.config.mode: "single-shot" (or similar) that bypasses runEmbeddedPiAgent and produces the memory summary in one LLM round-trip + one qmd query, instead of an N-turn agent loop.

This is the second of the two fixes proposed in #72347. That issue addressed the timeoutMs enforceability bug; this one is the latency-architecture sibling.

Root Cause

This is the second of the two fixes proposed in #72347. That issue addressed the timeoutMs enforceability bug; this one is the latency-architecture sibling.

Code Example

// pseudocode for AM single-shot path
const queryText = deriveQuery(message, config.queryMode);              // existing
const hits = await qmd.search(queryText, config.qmd);                  // ONE search
const prompt = buildSingleShotPrompt(message, hits, config.promptStyle);
const reply = await llm.complete({                                     // ONE API call
  model: config.model,
  prompt,
  maxTokens: chars2tokens(config.maxSummaryChars),
  abortSignal: budgetSignal,
});
return parseNoneOrSummary(reply);

RAW_BUFFERClick to expand / collapse

Summary

This is the second of the two fixes proposed in #72347. That issue addressed the timeoutMs enforceability bug; this one is the latency-architecture sibling.

Proposal

Roughly:

// pseudocode for AM single-shot path
const queryText = deriveQuery(message, config.queryMode);              // existing
const hits = await qmd.search(queryText, config.qmd);                  // ONE search
const prompt = buildSingleShotPrompt(message, hits, config.promptStyle);
const reply = await llm.complete({                                     // ONE API call
  model: config.model,
  prompt,
  maxTokens: chars2tokens(config.maxSummaryChars),
  abortSignal: budgetSignal,
});
return parseNoneOrSummary(reply);

No tools, no agent loop, no runEmbeddedPiAgent. Wall-clock ≈ 1 API round-trip + 1 qmd search.

Why this is needed in addition to the hard timeout from #72347

A hard timeout caps the worst case but doesn't change the mean. Empirically, the AM tool loop runs 6–12s end-to-end on fast models (gpt-5.4-mini direct API benchmarks at ~1.8s per call; multiply by 3–4 turns). To make AM useful as a preflight (not just a brake), the architecture itself needs to be lighter for promptStyle: "preference-only"-style use cases.

Scope (intentionally narrow)

In scope: new opt-in mode config value, single API call, no tool use, returns NONE or a ≤maxSummaryChars summary.
Out of scope: changing the existing multi-turn mode (which makes sense for recall-heavy / contextual styles), or auto-detecting which mode to use.

Repro showing why current architecture is too slow

(Same setup as #72347 — single Telegram DM to the agent, AM enabled. Shipped logs show 12–22s elapsed across 4 models, including local ollama and direct OpenAI API. Direct API benchmark of gpt-5.4-mini: 1.82s. Bottleneck is the loop, not the model. Full table in #72347.)

Note

Filing as a separate ticket per the suggestion in the closing comment of #72347. Single-purpose, hopefully easier to triage and not at risk of being auto-closed by the same bot pattern that mis-closed the parent.

extent analysis

TL;DR

Implement a new plugins.entries.active-memory.config.mode with a "single-shot" value to reduce latency by bypassing the runEmbeddedPiAgent loop.

Guidance

Introduce a new opt-in mode config value, "single-shot", to enable a single API call and no tool usage.
Modify the code to use a single LLM round-trip and one qmd query, as shown in the provided pseudocode.
Ensure the new mode returns NONE or a summary with a maximum length of maxSummaryChars characters.
Test the new mode with promptStyle: "preference-only"-style use cases to verify the latency improvement.

Example

const queryText = deriveQuery(message, config.queryMode);
const hits = await qmd.search(queryText, config.qmd);
const prompt = buildSingleShotPrompt(message, hits, config.promptStyle);
const reply = await llm.complete({
  model: config.model,
  prompt,
  maxTokens: chars2tokens(config.maxSummaryChars),
  abortSignal: budgetSignal,
});
return parseNoneOrSummary(reply);

Notes

The new "single-shot" mode is intended to coexist with the existing multi-turn mode, which will remain unchanged. The scope of this change is intentionally narrow, focusing on a specific use case.

Recommendation

Apply the workaround by implementing the new "single-shot" mode, as it addresses the latency issue without modifying the existing architecture. This change is expected to reduce the mean latency, making the Active Memory feature more suitable for preflight use cases.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Active Memory: add single-shot mode (no embedded agent loop) for low-latency preflight injection [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Proposal

Why this is needed in addition to the hard timeout from #72347

Scope (intentionally narrow)

Repro showing why current architecture is too slow

Note

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Active Memory: add single-shot mode (no embedded agent loop) for low-latency preflight injection [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Proposal

Why this is needed in addition to the hard timeout from #72347

Scope (intentionally narrow)

Repro showing why current architecture is too slow

Note

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING