openclaw - ✅(Solved) Fix Proposal: Cache TTL Warmer — preserve Anthropic prompt-cache across idle periods independent of heartbeats [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70418Fetched 2026-04-23 07:25:00
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Timeline (top)
commented ×2cross-referenced ×2

On models with cacheRetention: "long" (1h TTL), prompt-cache entries die after 1h of non-access. For agents that see bursty user interaction, this means cache entries built up during active conversation ages out during quiet periods and the next user turn pays full prefix rewrite.

Heartbeats were likely intended to keep cache warm, but as filed in the companion issue, heartbeat runs today build a separate cache chain (due to tool-set and system-prompt divergence) that doesn't refresh the conversation chain's TTL. Even once that's fixed, heartbeats have other semantics — user-visible reply guards (HEARTBEAT_OK), HEARTBEAT.md reading logic, delivery routing — that make them a noisy tool for a pure cache-keep-alive use case.

Root Cause

On models with cacheRetention: "long" (1h TTL), prompt-cache entries die after 1h of non-access. For agents that see bursty user interaction, this means cache entries built up during active conversation ages out during quiet periods and the next user turn pays full prefix rewrite.

Heartbeats were likely intended to keep cache warm, but as filed in the companion issue, heartbeat runs today build a separate cache chain (due to tool-set and system-prompt divergence) that doesn't refresh the conversation chain's TTL. Even once that's fixed, heartbeats have other semantics — user-visible reply guards (HEARTBEAT_OK), HEARTBEAT.md reading logic, delivery routing — that make them a noisy tool for a pure cache-keep-alive use case.

Fix Action

Fix / Workaround

Heartbeats remain available for their other purposes (periodic task dispatch, due-date reminders, ambient monitoring). The two features don't overlap.

PR fix notes

PR #70602: fix(heartbeat): keep full tool array during heartbeat runs

Description (problem / solution / changelog)

Summary

Fixes heartbeat runs defeating their own cache-keep-alive purpose by restoring the tools-prefix cache alignment with conversation turns.

Problem

Heartbeat runs (where senderIsOwner=false) were filtering out owner-restricted tools (gateway, cron, nodes) entirely from the tools array, causing the tools-prefix hash to differ from conversation runs. This broke Anthropic's prompt cache — every heartbeat paid full price for a new cache chain instead of refreshing the conversation's TTL.

Solution

Instead of removing owner-only tools during heartbeat, keep them in the tool list with runtime guards that throw if invoked. This preserves the tools-prefix cache hash while maintaining the safety property that heartbeat runs cannot actually use privileged tools.

Changes

  • src/agents/tool-policy.ts — Added optional keepOwnerTools flag to applyOwnerOnlyToolPolicy. When true, owner-only tools are kept (wrapped with guard) instead of being removed.
  • src/agents/pi-embedded-runner/effective-tool-policy.ts — Detect heartbeat runs via bootstrapContextRunKind === "heartbeat" and pass keepOwnerTools: true.
  • src/agents/pi-embedded-runner/run/attempt.ts — Forward bootstrapContextRunKind to applyFinalEffectiveToolPolicy.
  • src/agents/tool-policy.test.ts — Added regression test for keepOwnerTools behavior.

Testing

  • applyOwnerOnlyToolPolicy(tools, false, { keepOwnerTools: true }) → owner tools are kept (wrapped), not removed
  • applyOwnerOnlyToolPolicy(tools, false) → unchanged (owner tools removed for non-owner, as before)

Related

  • Closes #70417
  • Related: #70418 (orthogonal cache-warmer proposal, independent of heartbeats)

Changed files

  • src/agents/pi-embedded-runner/effective-tool-policy.ts (modified, +2/-0)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +1/-0)
  • src/agents/tool-policy.test.ts (modified, +13/-0)
  • src/agents/tool-policy.ts (modified, +15/-5)

Code Example

agents:
    defaults:
      cacheWarmer:
        enabled: true
        interval: "50m"   # < any cacheRetention TTL in use
        maxTokens: 1      # keep response tiny
    list:
      - id: main
        cacheWarmer:
          enabled: true
RAW_BUFFERClick to expand / collapse

Proposal: Cache TTL Warmer — preserve Anthropic prompt-cache across idle periods independent of heartbeats

Context

On models with cacheRetention: "long" (1h TTL), prompt-cache entries die after 1h of non-access. For agents that see bursty user interaction, this means cache entries built up during active conversation ages out during quiet periods and the next user turn pays full prefix rewrite.

Heartbeats were likely intended to keep cache warm, but as filed in the companion issue, heartbeat runs today build a separate cache chain (due to tool-set and system-prompt divergence) that doesn't refresh the conversation chain's TTL. Even once that's fixed, heartbeats have other semantics — user-visible reply guards (HEARTBEAT_OK), HEARTBEAT.md reading logic, delivery routing — that make them a noisy tool for a pure cache-keep-alive use case.

Proposal

Add a dedicated Cache TTL Warmer subsystem: a lightweight background task that periodically sends the minimum request necessary to refresh Anthropic's cache TTL for each agent's active prefix, without participating in the conversation semantics.

Shape

  • Per-agent, opt-in via config:
    agents:
      defaults:
        cacheWarmer:
          enabled: true
          interval: "50m"   # < any cacheRetention TTL in use
          maxTokens: 1      # keep response tiny
      list:
        - id: main
          cacheWarmer:
            enabled: true
  • Fires on a fixed interval (configurable; default interval < ttl).
  • Builds the exact same request shape the agent's next conversation turn would build: same system, tools, and messages array through the current cache-control boundary. The only difference is the appended user message, which is a minimal marker (e.g. "PING") and max_tokens: 1.
  • Discards the response (after consuming the minimal output to finalize the stream).
  • Never persists to session state, never delivers to any channel.

What this buys

  • Cache TTL refreshed at every marker position: the request reads all live entries at marker #1 / #2 / etc. in Anthropic's cache, which resets their TTL clocks.
  • Deterministic cost: cache_read on the stable prefix + minimal cache_write for the PING + 1 output token. On a 100K-token cached prefix:
    • cache_read: 100K × $0.50/M = $0.05 per refresh
    • cache_write: ~5 tokens × $10/M ≈ $0.00005 (negligible)
    • output: 1 × $25/M ≈ $0.000025 (negligible)
    • Total: ~$0.05 per refresh, ~$1.30/day/agent at 50min interval
  • Alternative cost (no warmer, cache expires twice a day and the next user turn pays to rewrite):
    • 100K × $10/M × 2 = $2/day/agent of avoidable cache-write spend
  • Net saving: ~$0.70/day/agent in the above scenario, scaling linearly with prefix size.

Orthogonal to heartbeats

The warmer is purely about cache TTL. It does not:

  • Run user-visible logic (no reading HEARTBEAT.md, no reply delivery)
  • Affect session state, memory, message history
  • Need channel-specific prompt content — it uses whatever the current session would use for a real turn

Heartbeats remain available for their other purposes (periodic task dispatch, due-date reminders, ambient monitoring). The two features don't overlap.

Why not just fix heartbeats to do this?

The companion issue (heartbeat cache divergence) is necessary regardless. But even once heartbeats properly refresh conversation cache, they:

  • Run every configured interval even when the agent is actively being used (wasteful — active sessions refresh cache on their own).
  • Consume an agent turn that might trigger HEARTBEAT_OK delivery, logging, etc.
  • Tie cache-keep-alive to the heartbeat's cadence, which is tuned for task-surface needs, not TTL math.

A dedicated warmer:

  • Can be disabled during active sessions (e.g. skip firing if last user turn < 5 min ago).
  • Has a single, narrow purpose with predictable cost.
  • Doesn't force users to configure heartbeats they don't otherwise need.

Open design questions

  1. Idle-skip heuristic: warmer skips firing when the session had a non-warmer request in the last N minutes (N < TTL). Suggested default: skip if any request in the last 5 minutes.
  2. Marker message shape: a single {role: "user", content: "PING"} at the end? A no-content synthetic turn? Needs validation that Anthropic accepts minimal completions.
  3. Max-tokens bound: max_tokens: 1 keeps output cost near zero, but forces a truncated response. Some harnesses may handle max_tokens=0 or max_tokens=1 + stop differently. Worth testing empirically.
  4. Scope — entries past the first cache_control marker: does the warmer need to include the full message history, or is refreshing only the system-prompt cache sufficient? The latter is simpler but less effective if the session's most expensive cached entries are message-history entries.
  5. Interaction with compaction / session reset: when a compaction happens, the warmer should either pause until the next real turn rebuilds cache, or fire immediately to warm the new prefix. Probably the former.
  6. Failure modes: warmer requests should silently fail on API errors (rate limits, 4xx on malformed minimal requests) without escalating — they're fire-and-forget maintenance.

Scope of this issue

This is a design discussion, not a PR. Looking for maintainer reaction to:

  • Whether the warmer is a worthwhile addition to openclaw given heartbeats already exist.
  • Whether the implementation lives inside the heartbeat subsystem (as a new mode) or as a distinct subsystem.
  • Preferences on the config surface.
  • Answers to the open design questions.

Happy to prototype + PR once direction is agreed.

Related

  • Companion issue: heartbeat cache divergence (link when filed).
  • Prior closed issue #16076: tool-search defer_loading support — another lever for cache stability, orthogonal to the warmer.

Proposed with design input from Claude.

extent analysis

TL;DR

Implement a dedicated Cache TTL Warmer subsystem to periodically refresh Anthropic's cache TTL for each agent's active prefix, reducing avoidable cache-write spend.

Guidance

  • Consider adding a lightweight background task that sends minimal requests to refresh the cache TTL, without participating in conversation semantics.
  • Evaluate the proposed configuration options, such as enabled, interval, and maxTokens, to determine the best approach for your use case.
  • Investigate the open design questions, including idle-skip heuristic, marker message shape, and max-tokens bound, to ensure the warmer is effective and efficient.
  • Assess the potential interaction with compaction and session reset to determine the best course of action when these events occur.

Example

agents:
  defaults:
    cacheWarmer:
      enabled: true
      interval: "50m"
      maxTokens: 1

This example configuration enables the cache warmer with a 50-minute interval and a maximum of 1 token.

Notes

The proposed Cache TTL Warmer subsystem is designed to be orthogonal to heartbeats, which have other semantics and purposes. The warmer's implementation should be evaluated in the context of the existing heartbeat subsystem and the overall architecture of the system.

Recommendation

Apply the proposed Cache TTL Warmer workaround to reduce avoidable cache-write spend and improve the overall efficiency of the system. This approach allows for a dedicated and lightweight solution that can be tailored to the specific needs of the system, without relying on the heartbeat subsystem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING