openclaw - ✅(Solved) Fix Feature Request: Prompt Cache-Aware Design [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56912Fetched 2026-04-08 01:46:08
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Author
Participants
Assignees
Timeline (top)
assigned ×1closed ×1commented ×1cross-referenced ×1

PR fix notes

PR #25548: fix(cache): stabilize runtimeChannel across turn types for prompt cache efficiency

Description (problem / solution / changelog)

Summary

Internal turn types (heartbeat, cron-event) temporarily override sessionCtx.Provider with non-channel values like "heartbeat" or "cron-event". Since messageProvider — and thus runtimeChannel in the system prompt — is derived from the per-turn provider context, the system prompt differs on each turn type switch. This breaks Anthropic's prefix-based prompt cache, causing the entire conversation history to be rewritten as a new cache entry.

Root cause: resolveOriginMessageProvider() in origin-routing.ts uses a two-step fallback:

OriginatingChannel ?? Provider

For a WhatsApp session, consecutive turns produce different system prompts:

TurnTriggerruntimeChannelCache result
1User messagewhatsapp✅ Cache write
2Heartbeatheartbeat❌ Full cache miss — prompt changed
3Tool resultwhatsapp❌ Full cache miss — prompt changed back
4Cron eventcron-event❌ Full cache miss — prompt changed again

Each switch changes the runtime line (channel=X | capabilities=Y), message tool description ("Current channel (X) supports: ..."), inline button hints, and capability-dependent sections.

Why this matters even more after v2026.2.24

v2026.2.24 switched the default heartbeat delivery target from last to none. This means:

  • OriginatingChannel = undefined (no delivery target → no originating channel)
  • Provider = "heartbeat"
  • Result: runtimeChannel = "heartbeat"wrong cache prefix

Without this fix, heartbeats can no longer serve as cache warmers. A session idle for >1h (Anthropic's extended cache TTL) will face a full cache-write on the next user message (~$0.50+ for large histories). With this fix, heartbeats resolve to the session's real channel and keep the cache alive at minimal cost.

Why the last few percent matter enormously: A single full cache miss can cost 10x or more compared to a cache-hit call — the entire conversation history is rewritten at cache write pricing ($3.75/MTok). Even one miss in a 5-call tool chain can dominate the total cost of the entire turn.

Fix

Extend resolveOriginMessageProvider() with a lastChannel fallback from the session entry, using a three-step priority chain:

  1. OriginatingChannel — explicit reply routing (already available on sessionCtx)
  2. sessionEntry.lastChannel — the session's last known real messaging channel (persisted in session store, survives heartbeat/cron turns) ← new
  3. sessionCtx.Provider — fallback for new sessions (existing behavior)

Steps 1 and 2 filter out internal/synthetic values (webchat, heartbeat, cron-event, exec-event) via isInternalMessageChannel() and a dedicated INTERNAL_PROVIDER_VALUES set.

This ensures a heartbeat or cron turn in a WhatsApp session still produces a WhatsApp-flavored system prompt, because any response will be delivered via WhatsApp.

Bonus: heartbeats as cache warmers

With a stable system prompt across turn types, heartbeat polls that return HEARTBEAT_OK (minimal output, transcript pruned) effectively keep the conversation cache alive at very low cost. Combined with extended-cache-ttl (1h TTL), heartbeats every 30 minutes prevent cache expiry during idle periods — even with delivery: "none".

Multi-channel note: Users with multiple channels sharing a session should be aware that the heartbeat warms only the last-used channel's cache. Per-session channels or per-channel heartbeat pings can address this.

Changes

  • origin-routing.ts: Extend resolveOriginMessageProvider() with lastChannel parameter, add INTERNAL_PROVIDER_VALUES set and filtering for internal/synthetic channel values
  • agent-runner-utils.ts: Thread sessionEntry through buildEmbeddedContextFromTemplate() and buildEmbeddedRunContexts() to provide lastChannel to the origin resolver
  • agent-runner-execution.ts: Pass sessionEntry from params.getActiveSessionEntry() into buildEmbeddedRunContexts()

Change Type

  • Bug fix

Scope

  • Gateway / orchestration

Linked Issue/PR

  • Related to #20597, #22220 (prompt cache stability improvements)
  • Builds on the origin-routing.ts refactor from #25864

User-visible / Behavior Changes

  • System prompt remains stable across heartbeat, cron, and user-message turns → significantly improved Anthropic prompt cache hit rates
  • No config changes needed; existing sessions benefit automatically
  • Sessions with cacheRetention: "long" + extended-cache-ttl see the largest improvement
  • Heartbeats with delivery: "none" (new default) now correctly warm the cache for the session's real channel

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Testing

Verified over 100+ API calls:

ScenarioBeforeAfter
User message → heartbeat → user message0-56% hit rate99.8%
User message → cron wake → user message0-56% hit rate99.5%
Normal user message → tool calls85-95%99.5-99.9%
Multi-tool-call chain (5+ sequential)~88% avg99.7% avg
Heartbeat (delivery=none) → user message0% (wrong prefix)99.8%

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +1/-0)
  • src/auto-reply/reply/agent-runner-utils.test.ts (modified, +77/-0)
  • src/auto-reply/reply/agent-runner-utils.ts (modified, +13/-5)
  • src/auto-reply/reply/origin-routing.ts (modified, +51/-5)

Code Example

caching:
  anthropic:
    enabled: true
    freeze_tools: true        # lock tool set at session start
    freeze_system_prompt: true
    breakpoints:
      - after_system_prompt
      - after_tools
      - after_memory
RAW_BUFFERClick to expand / collapse

Feature Request: Prompt Cache-Aware Design

Problem Statement

Anthropic's API supports prompt caching — reusing previously processed prompt prefixes at 90% cost reduction. However, if the system prompt, tool definitions, or memory blocks change ordering or content between turns, the cache is invalidated. It's unclear whether OpenClaw currently optimizes for this, and the default behavior of dynamically injecting skills and tools likely breaks caching on most turns.

Proposed Solution

Stable Prompt Prefix

  • At conversation start, freeze the ordering of: system prompt → tool definitions → memory/context blocks.
  • Do not reorder, add, or remove tools mid-conversation unless explicitly requested.
  • Mark cache breakpoints using Anthropic's cache_control headers at strategic points.

Freeze Mechanism

caching:
  anthropic:
    enabled: true
    freeze_tools: true        # lock tool set at session start
    freeze_system_prompt: true
    breakpoints:
      - after_system_prompt
      - after_tools
      - after_memory

Dynamic Content Handling

  • Skills that inject tools dynamically should do so at session start only, not per-turn.
  • If a new tool must be added mid-session, append (don't insert) to preserve the cached prefix.
  • Memory updates append to the end of the memory block rather than rewriting it.

Observability

  • Track and surface cache hit rates in /status or session stats.
  • Log cache invalidation events with the reason (which component changed).
  • Cost dashboard shows cache savings.

User Impact

  • Cost reduction: Up to 90% savings on cached prompt prefixes for long Anthropic conversations.
  • Compounding benefit: Combined with dual-layer compression (Feature #1), compressed conversations retain their cached prefix, maximizing savings.
  • Transparent: Users don't need to understand caching; the system optimizes automatically.

Technical Considerations

  • Anthropic-specific: This optimization is primarily for Anthropic's API. Google's Gemini has similar caching; OpenAI's approach differs. Design should be provider-extensible.
  • Tool ordering: Skills that dynamically register/unregister tools will break caching. Need a "frozen toolset" mode.
  • Memory mutations: If MEMORY.md or context files change mid-session, the memory block in the prompt changes, invalidating cache after that point. Append-only updates minimize this.
  • Measurement: Without cache hit rate tracking, it's impossible to know if optimization is working. Instrumentation is essential.
  • Low effort: Mostly involves disciplined ordering and adding cache control headers — not a major architectural change.

Priority

MEDIUM. High ROI relative to implementation effort. Primarily a cost optimization, but for heavy Anthropic users (OpenClaw's primary audience), the savings are substantial. Ships independently but compounds with context compression.

extent analysis

Fix Plan

To implement a cache-aware design, follow these steps:

  • Freeze the ordering of system prompt, tool definitions, and memory/context blocks at conversation start.
  • Use the freeze_tools and freeze_system_prompt configuration options.
  • Implement cache breakpoints using Anthropic's cache_control headers.

Example configuration:

caching:
  anthropic:
    enabled: true
    freeze_tools: true
    freeze_system_prompt: true
    breakpoints:
      - after_system_prompt
      - after_tools
      - after_memory

Code Changes

To handle dynamic content, modify skills to inject tools at session start only:

def inject_tools(session):
    # Inject tools at session start
    if session.is_new:
        # Add tools to the session
        session.tools = get_tools()

To preserve the cached prefix when adding new tools mid-session, append to the tool list instead of inserting:

def add_tool(session, tool):
    # Append the new tool to the session's tool list
    session.tools.append(tool)

To minimize cache invalidation due to memory updates, append to the end of the memory block instead of rewriting it:

def update_memory(session, new_memory):
    # Append the new memory to the session's memory block
    session.memory += new_memory

Verification

To verify that the fix worked, track and surface cache hit rates in /status or session stats. Log cache invalidation events with the reason (which component changed). The cost dashboard should show cache savings. Check for a reduction in costs and an increase in cache hit rates after implementing the cache-aware design.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Feature Request: Prompt Cache-Aware Design [1 pull requests, 1 comments, 2 participants]