openclaw - ✅(Solved) Fix Feature Request: Prompt Cache-Aware Design [1 pull requests, 1 comments, 2 participants]

openclaw2026-03-29 09:18:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#56912•Fetched 2026-04-08 01:46:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

assigned ×1closed ×1commented ×1cross-referenced ×1

PR fix notes

PR #25548: fix(cache): stabilize runtimeChannel across turn types for prompt cache efficiency

Repository: openclaw/openclaw
Author: liebpmp
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/25548

Description (problem / solution / changelog)

Summary

Internal turn types (heartbeat, cron-event) temporarily override sessionCtx.Provider with non-channel values like "heartbeat" or "cron-event". Since messageProvider — and thus runtimeChannel in the system prompt — is derived from the per-turn provider context, the system prompt differs on each turn type switch. This breaks Anthropic's prefix-based prompt cache, causing the entire conversation history to be rewritten as a new cache entry.

Root cause: resolveOriginMessageProvider() in origin-routing.ts uses a two-step fallback:

OriginatingChannel ?? Provider

For a WhatsApp session, consecutive turns produce different system prompts:

Turn	Trigger	`runtimeChannel`	Cache result
1	User message	`whatsapp`	✅ Cache write
2	Heartbeat	`heartbeat`	❌ Full cache miss — prompt changed
3	Tool result	`whatsapp`	❌ Full cache miss — prompt changed back
4	Cron event	`cron-event`	❌ Full cache miss — prompt changed again

Each switch changes the runtime line (channel=X | capabilities=Y), message tool description ("Current channel (X) supports: ..."), inline button hints, and capability-dependent sections.

Why this matters even more after v2026.2.24

v2026.2.24 switched the default heartbeat delivery target from last to none. This means:

OriginatingChannel = undefined (no delivery target → no originating channel)
Provider = "heartbeat"
Result: runtimeChannel = "heartbeat" — wrong cache prefix

Without this fix, heartbeats can no longer serve as cache warmers. A session idle for >1h (Anthropic's extended cache TTL) will face a full cache-write on the next user message (~$0.50+ for large histories). With this fix, heartbeats resolve to the session's real channel and keep the cache alive at minimal cost.

Why the last few percent matter enormously: A single full cache miss can cost 10x or more compared to a cache-hit call — the entire conversation history is rewritten at cache write pricing ($3.75/MTok). Even one miss in a 5-call tool chain can dominate the total cost of the entire turn.

Fix

Extend resolveOriginMessageProvider() with a lastChannel fallback from the session entry, using a three-step priority chain:

OriginatingChannel — explicit reply routing (already available on sessionCtx)
sessionEntry.lastChannel — the session's last known real messaging channel (persisted in session store, survives heartbeat/cron turns) ← new
sessionCtx.Provider — fallback for new sessions (existing behavior)

Steps 1 and 2 filter out internal/synthetic values (webchat, heartbeat, cron-event, exec-event) via isInternalMessageChannel() and a dedicated INTERNAL_PROVIDER_VALUES set.

This ensures a heartbeat or cron turn in a WhatsApp session still produces a WhatsApp-flavored system prompt, because any response will be delivered via WhatsApp.

Bonus: heartbeats as cache warmers

With a stable system prompt across turn types, heartbeat polls that return HEARTBEAT_OK (minimal output, transcript pruned) effectively keep the conversation cache alive at very low cost. Combined with extended-cache-ttl (1h TTL), heartbeats every 30 minutes prevent cache expiry during idle periods — even with delivery: "none".

Multi-channel note: Users with multiple channels sharing a session should be aware that the heartbeat warms only the last-used channel's cache. Per-session channels or per-channel heartbeat pings can address this.

Changes

origin-routing.ts: Extend resolveOriginMessageProvider() with lastChannel parameter, add INTERNAL_PROVIDER_VALUES set and filtering for internal/synthetic channel values
agent-runner-utils.ts: Thread sessionEntry through buildEmbeddedContextFromTemplate() and buildEmbeddedRunContexts() to provide lastChannel to the origin resolver
agent-runner-execution.ts: Pass sessionEntry from params.getActiveSessionEntry() into buildEmbeddedRunContexts()

Change Type

Bug fix

Scope

Gateway / orchestration

Linked Issue/PR

Related to #20597, #22220 (prompt cache stability improvements)
Builds on the origin-routing.ts refactor from #25864

User-visible / Behavior Changes

System prompt remains stable across heartbeat, cron, and user-message turns → significantly improved Anthropic prompt cache hit rates
No config changes needed; existing sessions benefit automatically
Sessions with cacheRetention: "long" + extended-cache-ttl see the largest improvement
Heartbeats with delivery: "none" (new default) now correctly warm the cache for the session's real channel

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Testing

Verified over 100+ API calls:

Scenario	Before	After
User message → heartbeat → user message	0-56% hit rate	99.8%
User message → cron wake → user message	0-56% hit rate	99.5%
Normal user message → tool calls	85-95%	99.5-99.9%
Multi-tool-call chain (5+ sequential)	~88% avg	99.7% avg
Heartbeat (delivery=none) → user message	0% (wrong prefix)	99.8%

Changed files

CHANGELOG.md (modified, +1/-0)
src/auto-reply/reply/agent-runner-execution.ts (modified, +1/-0)
src/auto-reply/reply/agent-runner-utils.test.ts (modified, +77/-0)
src/auto-reply/reply/agent-runner-utils.ts (modified, +13/-5)
src/auto-reply/reply/origin-routing.ts (modified, +51/-5)

Code Example

caching:
  anthropic:
    enabled: true
    freeze_tools: true        # lock tool set at session start
    freeze_system_prompt: true
    breakpoints:
      - after_system_prompt
      - after_tools
      - after_memory

RAW_BUFFERClick to expand / collapse

Feature Request: Prompt Cache-Aware Design

Problem Statement

Anthropic's API supports prompt caching — reusing previously processed prompt prefixes at 90% cost reduction. However, if the system prompt, tool definitions, or memory blocks change ordering or content between turns, the cache is invalidated. It's unclear whether OpenClaw currently optimizes for this, and the default behavior of dynamically injecting skills and tools likely breaks caching on most turns.

Proposed Solution

Stable Prompt Prefix

At conversation start, freeze the ordering of: system prompt → tool definitions → memory/context blocks.
Do not reorder, add, or remove tools mid-conversation unless explicitly requested.
Mark cache breakpoints using Anthropic's cache_control headers at strategic points.

Freeze Mechanism

caching:
  anthropic:
    enabled: true
    freeze_tools: true        # lock tool set at session start
    freeze_system_prompt: true
    breakpoints:
      - after_system_prompt
      - after_tools
      - after_memory

Dynamic Content Handling

Skills that inject tools dynamically should do so at session start only, not per-turn.
If a new tool must be added mid-session, append (don't insert) to preserve the cached prefix.
Memory updates append to the end of the memory block rather than rewriting it.

Observability

Track and surface cache hit rates in /status or session stats.
Log cache invalidation events with the reason (which component changed).
Cost dashboard shows cache savings.

User Impact

Cost reduction: Up to 90% savings on cached prompt prefixes for long Anthropic conversations.
Compounding benefit: Combined with dual-layer compression (Feature #1), compressed conversations retain their cached prefix, maximizing savings.
Transparent: Users don't need to understand caching; the system optimizes automatically.

Technical Considerations

Anthropic-specific: This optimization is primarily for Anthropic's API. Google's Gemini has similar caching; OpenAI's approach differs. Design should be provider-extensible.
Tool ordering: Skills that dynamically register/unregister tools will break caching. Need a "frozen toolset" mode.
Memory mutations: If MEMORY.md or context files change mid-session, the memory block in the prompt changes, invalidating cache after that point. Append-only updates minimize this.
Measurement: Without cache hit rate tracking, it's impossible to know if optimization is working. Instrumentation is essential.
Low effort: Mostly involves disciplined ordering and adding cache control headers — not a major architectural change.

Priority

MEDIUM. High ROI relative to implementation effort. Primarily a cost optimization, but for heavy Anthropic users (OpenClaw's primary audience), the savings are substantial. Ships independently but compounds with context compression.

extent analysis

Fix Plan

To implement a cache-aware design, follow these steps:

Freeze the ordering of system prompt, tool definitions, and memory/context blocks at conversation start.
Use the freeze_tools and freeze_system_prompt configuration options.
Implement cache breakpoints using Anthropic's cache_control headers.

Example configuration:

caching:
  anthropic:
    enabled: true
    freeze_tools: true
    freeze_system_prompt: true
    breakpoints:
      - after_system_prompt
      - after_tools
      - after_memory

Code Changes

To handle dynamic content, modify skills to inject tools at session start only:

def inject_tools(session):
    # Inject tools at session start
    if session.is_new:
        # Add tools to the session
        session.tools = get_tools()

To preserve the cached prefix when adding new tools mid-session, append to the tool list instead of inserting:

def add_tool(session, tool):
    # Append the new tool to the session's tool list
    session.tools.append(tool)

To minimize cache invalidation due to memory updates, append to the end of the memory block instead of rewriting it:

def update_memory(session, new_memory):
    # Append the new memory to the session's memory block
    session.memory += new_memory

Verification

To verify that the fix worked, track and surface cache hit rates in /status or session stats. Log cache invalidation events with the reason (which component changed). The cost dashboard should show cache savings. Check for a reduction in costs and an increase in cache hit rates after implementing the cache-aware design.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.