openclaw - 💡(How to fix) Fix openviking context-engine frequently rebuilds prompt prefix, hurting prompt-cache reuse [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#50912Fetched 2026-04-08 01:06:40
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
closed ×1commented ×1locked ×1

When openviking is used as the active contextEngine, it does not rewrite session history on disk, but it does rebuild the model-facing prompt prefix on almost every turn via before_prompt_build.

This substantially reduces prompt-cache reuse compared with a more stable context-engine such as lossless.

Root Cause

The current behavior makes cache utilization much worse in real usage because the prompt prefix becomes highly unstable across turns:

  • every turn re-runs retrieval
  • the retrieved set/order can drift
  • injected content is often the full L2 memory content, not a stable short reference
  • ingest-reply-assist may also prepend another dynamic block

So even if the visible conversation history is mostly unchanged, the final prompt sent to the model changes frequently at the top of the prompt.

Code Example

async assemble(assembleParams): Promise<AssembleResult> {
  return {
    messages: assembleParams.messages,
    estimatedTokens: estimateTokens(assembleParams.messages),
  };
}

---

api.on("before_prompt_build", async (event: unknown, ctx?: HookAgentContext) => {
  ...
  const prependContextParts: string[] = [];
  ...
  if (cfg.autoRecall && queryText.length >= 5) {
    ...
    prependContextParts.push(
      "<relevant-memories>\nThe following OpenViking memories may be relevant:\n" +
        `${memoryContext}\n` +
      "</relevant-memories>",
    );
  }
  ...
  if (cfg.ingestReplyAssist) {
    ...
    prependContextParts.push(
      "<ingest-reply-assist>\n" +
      ...
      "</ingest-reply-assist>",
    );
  }

  if (prependContextParts.length > 0) {
    return {
      prependContext: prependContextParts.join("\n\n"),
    };
  }
});

---

const memoryLines = await Promise.all(
  memories.map(async (item: FindResultItem) => {
    if (item.level === 2) {
      try {
        const content = await client.read(item.uri);
        if (content && typeof content === "string" && content.trim()) {
          return `- [${item.category ?? "memory"}] ${content.trim()}`;
        }
      } catch {
        // fallback to abstract
      }
    }
    return `- [${item.category ?? "memory"}] ${item.abstract ?? item.uri}`;
  }),
);
RAW_BUFFERClick to expand / collapse

Summary

When openviking is used as the active contextEngine, it does not rewrite session history on disk, but it does rebuild the model-facing prompt prefix on almost every turn via before_prompt_build.

This substantially reduces prompt-cache reuse compared with a more stable context-engine such as lossless.

Why this matters

The current behavior makes cache utilization much worse in real usage because the prompt prefix becomes highly unstable across turns:

  • every turn re-runs retrieval
  • the retrieved set/order can drift
  • injected content is often the full L2 memory content, not a stable short reference
  • ingest-reply-assist may also prepend another dynamic block

So even if the visible conversation history is mostly unchanged, the final prompt sent to the model changes frequently at the top of the prompt.

Code facts

1) assemble() does not rewrite session messages

extensions/openviking/context-engine.ts

async assemble(assembleParams): Promise<AssembleResult> {
  return {
    messages: assembleParams.messages,
    estimatedTokens: estimateTokens(assembleParams.messages),
  };
}

So the issue is not that the context engine rewrites session history itself.

2) before_prompt_build dynamically prepends new context

extensions/openviking/index.ts

api.on("before_prompt_build", async (event: unknown, ctx?: HookAgentContext) => {
  ...
  const prependContextParts: string[] = [];
  ...
  if (cfg.autoRecall && queryText.length >= 5) {
    ...
    prependContextParts.push(
      "<relevant-memories>\nThe following OpenViking memories may be relevant:\n" +
        `${memoryContext}\n` +
      "</relevant-memories>",
    );
  }
  ...
  if (cfg.ingestReplyAssist) {
    ...
    prependContextParts.push(
      "<ingest-reply-assist>\n" +
      ...
      "</ingest-reply-assist>",
    );
  }

  if (prependContextParts.length > 0) {
    return {
      prependContext: prependContextParts.join("\n\n"),
    };
  }
});

3) injected memory content is often the full L2 content

In the same hook:

const memoryLines = await Promise.all(
  memories.map(async (item: FindResultItem) => {
    if (item.level === 2) {
      try {
        const content = await client.read(item.uri);
        if (content && typeof content === "string" && content.trim()) {
          return `- [${item.category ?? "memory"}] ${content.trim()}`;
        }
      } catch {
        // fallback to abstract
      }
    }
    return `- [${item.category ?? "memory"}] ${item.abstract ?? item.uri}`;
  }),
);

This means the prepended prompt block is not just a stable list of IDs / short summaries. It can contain full memory content, which is much more likely to drift and invalidate prompt caching.

Observed behavior in logs

In practice, logs repeatedly show memory injection on many turns, e.g.:

  • openviking: injecting 6 memories into context
  • followed by inject-detail ...
  • often also ingest-reply-assist applied ...

This strongly suggests the prompt prefix is being rebuilt very frequently.

Expected behavior

A context engine should be more cache-friendly by default, especially for long-running sessions.

Possible directions:

  1. prefer stable, short injection (L0/L1 summary) over full L2 body reads
  2. avoid re-injecting on every turn when query/session state has not materially changed
  3. stabilize ordering/selection within a short session window
  4. optionally cache or reuse the previously injected recall block until query novelty exceeds a threshold
  5. make the aggressive dynamic prepend behavior optional via config

Minimal ask

Please consider treating cache stability as a first-class constraint for openviking context-engine injection. Right now the current before_prompt_build strategy appears retrieval-effective but cache-hostile.

extent analysis

Fix Plan

To improve cache stability, we'll modify the before_prompt_build hook to:

  • Prefer stable, short injections (L0/L1 summaries) over full L2 body reads
  • Avoid re-injecting on every turn when query/session state has not materially changed
  • Stabilize ordering/selection within a short session window

Code Changes

// extensions/openviking/index.ts
api.on("before_prompt_build", async (event: unknown, ctx?: HookAgentContext) => {
  // ...
  const prependContextParts: string[] = [];
  // ...

  // Prefer L0/L1 summaries over full L2 body reads
  if (cfg.autoRecall && queryText.length >= 5) {
    const memorySummaries = await getMemorySummaries(memories);
    prependContextParts.push(
      "<relevant-memories>\nThe following OpenViking memories may be relevant:\n" +
        memorySummaries.join("\n") +
      "</relevant-memories>",
    );
  }

  // Avoid re-injecting on every turn when query/session state has not materially changed
  if (ctx && ctx.previousQueryText === queryText) {
    return ctx.previousPrependContext;
  }

  // Stabilize ordering/selection within a short session window
  const stableMemories = getStableMemories(memories, queryText);
  prependContextParts.push(
    "<stable-memories>\nThe following OpenViking memories are stable:\n" +
      stableMemories.join("\n") +
    "</stable-memories>",
  );

  if (prependContextParts.length > 0) {
    const prependContext = prependContextParts.join("\n\n");
    // Cache the prepend context for the next turn
    ctx.previousPrependContext = prependContext;
    return { prependContext };
  }
});

// Helper functions
async function getMemorySummaries(memories: FindResultItem[]) {
  return Promise.all(
    memories.map(async (item: FindResultItem) => {
      if (item.level === 0 || item.level === 1) {
        return `- [${item.category ?? "memory"}] ${item.abstract ?? item.uri}`;
      }
      // Fallback to full L2 body read if no summary available
      try {
        const content = await client.read(item.uri);
        if (content && typeof content === "string" && content.trim()) {
          return `- [${item.category ?? "memory"}] ${content.trim()}`;
        }
      } catch {
        // Fallback to abstract
        return `- [${item.category ?? "memory"}] ${item.abstract ?? item.uri}`;
      }
    }),
  );
}

function getStableMemories(memories: FindResultItem[], queryText: string) {
  // Implement a simple stability heuristic, e.g., based on query similarity
  const stableMemories

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A context engine should be more cache-friendly by default, especially for long-running sessions.

Possible directions:

  1. prefer stable, short injection (L0/L1 summary) over full L2 body reads
  2. avoid re-injecting on every turn when query/session state has not materially changed
  3. stabilize ordering/selection within a short session window
  4. optionally cache or reuse the previously injected recall block until query novelty exceeds a threshold
  5. make the aggressive dynamic prepend behavior optional via config

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING