A context engine should be more cache-friendly by default, especially for long-running sessions. Possible directions: 1. prefer stable, short injection (L0/L1 summary) over full L2 body reads 2. avoid re-injecting on every turn when query/session state has not materially changed 3. stabilize ordering/selection within a short session window 4. optionally cache or reuse the previously injected recall block until query novelty exceeds a threshold 5. make the aggressive dynamic prepend behavior optional via config

openclaw - 💡(How to fix) Fix openviking context-engine frequently rebuilds prompt prefix, hurting prompt-cache reuse [1 comments, 1 participants]

Root Cause

The current behavior makes cache utilization much worse in real usage because the prompt prefix becomes highly unstable across turns:

every turn re-runs retrieval
the retrieved set/order can drift
injected content is often the full L2 memory content, not a stable short reference
ingest-reply-assist may also prepend another dynamic block

So even if the visible conversation history is mostly unchanged, the final prompt sent to the model changes frequently at the top of the prompt.

Code Example

async assemble(assembleParams): Promise<AssembleResult> {
  return {
    messages: assembleParams.messages,
    estimatedTokens: estimateTokens(assembleParams.messages),
  };
}

---

api.on("before_prompt_build", async (event: unknown, ctx?: HookAgentContext) => {
  ...
  const prependContextParts: string[] = [];
  ...
  if (cfg.autoRecall && queryText.length >= 5) {
    ...
    prependContextParts.push(
      "<relevant-memories>\nThe following OpenViking memories may be relevant:\n" +
        `${memoryContext}\n` +
      "</relevant-memories>",
    );
  }
  ...
  if (cfg.ingestReplyAssist) {
    ...
    prependContextParts.push(
      "<ingest-reply-assist>\n" +
      ...
      "</ingest-reply-assist>",
    );
  }

  if (prependContextParts.length > 0) {
    return {
      prependContext: prependContextParts.join("\n\n"),
    };
  }
});

---

const memoryLines = await Promise.all(
  memories.map(async (item: FindResultItem) => {
    if (item.level === 2) {
      try {
        const content = await client.read(item.uri);
        if (content && typeof content === "string" && content.trim()) {
          return `- [${item.category ?? "memory"}] ${content.trim()}`;
        }
      } catch {
        // fallback to abstract
      }
    }
    return `- [${item.category ?? "memory"}] ${item.abstract ?? item.uri}`;
  }),
);

Summary

When openviking is used as the active contextEngine, it does not rewrite session history on disk, but it does rebuild the model-facing prompt prefix on almost every turn via before_prompt_build.

This substantially reduces prompt-cache reuse compared with a more stable context-engine such as lossless.

Why this matters

The current behavior makes cache utilization much worse in real usage because the prompt prefix becomes highly unstable across turns:

every turn re-runs retrieval
the retrieved set/order can drift
injected content is often the full L2 memory content, not a stable short reference
ingest-reply-assist may also prepend another dynamic block

So even if the visible conversation history is mostly unchanged, the final prompt sent to the model changes frequently at the top of the prompt.

Code facts

1) `assemble()` does not rewrite session messages

extensions/openviking/context-engine.ts

async assemble(assembleParams): Promise<AssembleResult> {
  return {
    messages: assembleParams.messages,
    estimatedTokens: estimateTokens(assembleParams.messages),
  };
}

So the issue is not that the context engine rewrites session history itself.

2) `before_prompt_build` dynamically prepends new context

extensions/openviking/index.ts

api.on("before_prompt_build", async (event: unknown, ctx?: HookAgentContext) => {
  ...
  const prependContextParts: string[] = [];
  ...
  if (cfg.autoRecall && queryText.length >= 5) {
    ...
    prependContextParts.push(
      "<relevant-memories>\nThe following OpenViking memories may be relevant:\n" +
        `${memoryContext}\n` +
      "</relevant-memories>",
    );
  }
  ...
  if (cfg.ingestReplyAssist) {
    ...
    prependContextParts.push(
      "<ingest-reply-assist>\n" +
      ...
      "</ingest-reply-assist>",
    );
  }

  if (prependContextParts.length > 0) {
    return {
      prependContext: prependContextParts.join("\n\n"),
    };
  }
});

3) injected memory content is often the full L2 content

In the same hook:

const memoryLines = await Promise.all(
  memories.map(async (item: FindResultItem) => {
    if (item.level === 2) {
      try {
        const content = await client.read(item.uri);
        if (content && typeof content === "string" && content.trim()) {
          return `- [${item.category ?? "memory"}] ${content.trim()}`;
        }
      } catch {
        // fallback to abstract
      }
    }
    return `- [${item.category ?? "memory"}] ${item.abstract ?? item.uri}`;
  }),
);

This means the prepended prompt block is not just a stable list of IDs / short summaries. It can contain full memory content, which is much more likely to drift and invalidate prompt caching.

Observed behavior in logs

In practice, logs repeatedly show memory injection on many turns, e.g.:

openviking: injecting 6 memories into context
followed by inject-detail ...
often also ingest-reply-assist applied ...

This strongly suggests the prompt prefix is being rebuilt very frequently.

Expected behavior

A context engine should be more cache-friendly by default, especially for long-running sessions.

Possible directions:

prefer stable, short injection (L0/L1 summary) over full L2 body reads
avoid re-injecting on every turn when query/session state has not materially changed
stabilize ordering/selection within a short session window
optionally cache or reuse the previously injected recall block until query novelty exceeds a threshold
make the aggressive dynamic prepend behavior optional via config

Minimal ask

Please consider treating cache stability as a first-class constraint for openviking context-engine injection. Right now the current before_prompt_build strategy appears retrieval-effective but cache-hostile.

extent analysis

Fix Plan

To improve cache stability, we'll modify the before_prompt_build hook to:

Prefer stable, short injections (L0/L1 summaries) over full L2 body reads
Avoid re-injecting on every turn when query/session state has not materially changed
Stabilize ordering/selection within a short session window

Code Changes

// extensions/openviking/index.ts
api.on("before_prompt_build", async (event: unknown, ctx?: HookAgentContext) => {
  // ...
  const prependContextParts: string[] = [];
  // ...

  // Prefer L0/L1 summaries over full L2 body reads
  if (cfg.autoRecall && queryText.length >= 5) {
    const memorySummaries = await getMemorySummaries(memories);
    prependContextParts.push(
      "<relevant-memories>\nThe following OpenViking memories may be relevant:\n" +
        memorySummaries.join("\n") +
      "</relevant-memories>",
    );
  }

  // Avoid re-injecting on every turn when query/session state has not materially changed
  if (ctx && ctx.previousQueryText === queryText) {
    return ctx.previousPrependContext;
  }

  // Stabilize ordering/selection within a short session window
  const stableMemories = getStableMemories(memories, queryText);
  prependContextParts.push(
    "<stable-memories>\nThe following OpenViking memories are stable:\n" +
      stableMemories.join("\n") +
    "</stable-memories>",
  );

  if (prependContextParts.length > 0) {
    const prependContext = prependContextParts.join("\n\n");
    // Cache the prepend context for the next turn
    ctx.previousPrependContext = prependContext;
    return { prependContext };
  }
});

// Helper functions
async function getMemorySummaries(memories: FindResultItem[]) {
  return Promise.all(
    memories.map(async (item: FindResultItem) => {
      if (item.level === 0 || item.level === 1) {
        return `- [${item.category ?? "memory"}] ${item.abstract ?? item.uri}`;
      }
      // Fallback to full L2 body read if no summary available
      try {
        const content = await client.read(item.uri);
        if (content && typeof content === "string" && content.trim()) {
          return `- [${item.category ?? "memory"}] ${content.trim()}`;
        }
      } catch {
        // Fallback to abstract
        return `- [${item.category ?? "memory"}] ${item.abstract ?? item.uri}`;
      }
    }),
  );
}

function getStableMemories(memories: FindResultItem[], queryText: string) {
  // Implement a simple stability heuristic, e.g., based on query similarity
  const stableMemories

FAQ

Expected behavior

A context engine should be more cache-friendly by default, especially for long-running sessions.

Possible directions:

prefer stable, short injection (L0/L1 summary) over full L2 body reads
avoid re-injecting on every turn when query/session state has not materially changed
stabilize ordering/selection within a short session window
optionally cache or reuse the previously injected recall block until query novelty exceeds a threshold
make the aggressive dynamic prepend behavior optional via config

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix openviking context-engine frequently rebuilds prompt prefix, hurting prompt-cache reuse [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Why this matters

Code facts

1) `assemble()` does not rewrite session messages

2) `before_prompt_build` dynamically prepends new context

3) injected memory content is often the full L2 content

Observed behavior in logs

Expected behavior

Minimal ask

extent analysis

Fix Plan

Code Changes

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix openviking context-engine frequently rebuilds prompt prefix, hurting prompt-cache reuse [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Why this matters

Code facts

1) assemble() does not rewrite session messages

2) before_prompt_build dynamically prepends new context

3) injected memory content is often the full L2 content

Observed behavior in logs

Expected behavior

Minimal ask

extent analysis

Fix Plan

Code Changes

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1) `assemble()` does not rewrite session messages

2) `before_prompt_build` dynamically prepends new context