openclaw - ✅(Solved) Fix Proposal: Cache TTL Warmer — preserve Anthropic prompt-cache across idle periods independent of heartbeats [1 pull requests, 2 comments, 2 participants]

brianthinks · 2026-04-23T01:01:52Z

[openclaw] On models with cacheRetention: "long" 1h TTL , prompt-cache entries die after 1h of non-access. For agents that see bursty user interaction, this me… On models with `cacheRetention: "long"` (1h TTL), prompt-cache entries die after 1h of non-access. For agents that see bursty user interaction, this means cache entries built up during active conversation ages out during quiet periods and the next user turn pays full prefix rewrite. Heartbeats were likely *intended* to keep cache warm, but as filed in the companion issue, heartbeat runs today build a separate cache chain (due to tool-set and system-prompt divergence) that doesn't refresh the conversation chain's TTL. Even once that's fixed, heartbeats have other semantics — user-visible reply guards (`HEARTBEAT_OK`), HEARTBEAT.md reading logic, delivery routing — that make them a noisy tool for a pure cache-keep-alive use case. # PR #70602: fix(heartbeat): keep full tool array during heartbeat runs - Repository: openclaw/openclaw - Author: chinar-amrutkar - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/70602 ## Description (problem / solution / changelog) ## Summary Fixes heartbeat runs defeating their own cache-keep-alive purpose by restoring the tools-prefix cache alignment with conversation turns. ## Problem Heartbeat runs (where `senderIsOwner=false`) were filtering out owner-restricted tools (`gateway`, `cron`, `nodes`) entirely from the tools array, causing the tools-prefix hash to differ from conversation runs. This broke Anthropic's prompt cache — every heartbeat paid full price for a new cache chain instead of refreshing the conversation's TTL. ## Solution Instead of removing owner-only tools during heartbeat, keep them in the tool list with runtime guards that throw if invoked. This preserves the tools-prefix cache hash while maintaining the safety property that heartbeat runs cannot actually use privileged tools. ## Changes - **`src/agents/tool-policy.ts`** — Added optional `keepOwnerTools` flag to `applyOwnerOnlyToolPolicy`. When `true`, owner-only tools are kept (wrapped with guard) instead of being removed. - **`src/agents/pi-embedded-runner/effective-tool-policy.ts`** — Detect heartbeat runs via `bootstrapContextRunKind === "heartbeat"` and pass `keepOwnerTools: true`. - **`src/agents/pi-embedded-runner/run/attempt.ts`** — Forward `bootstrapContextRunKind` to `applyFinalEffectiveToolPolicy`. - **`src/agents/tool-policy.test.ts`** — Added regression test for `keepOwnerTools` behavior. ## Testing - `applyOwnerOnlyToolPolicy(tools, false, { keepOwnerTools: true })` → owner tools are kept (wrapped), not removed - `applyOwnerOnlyToolPolicy(tools, false)` → unchanged (owner tools removed for non-owner, as before) ## Related - Closes #70417 - Related: #70418 (orthogonal cache-warmer proposal, independent of heartbeats) ## Changed files - `src/agents/pi-embedded-runner/effective-tool-policy.ts` (modified, +2/-0) - `src/agents/pi-embedded-runner/run/attempt.ts` (modified, +1/-0) - `src/agents/tool-policy.test.ts` (modified, +13/-0) - `src/agents/tool-policy.ts` (modified, +15/-5) ## Fix / Workaround Heartbeats remain available for their *other* purposes (periodic task dispatch, due-date reminders, ambient monitoring). The two features don't overlap. # Proposal: Cache TTL Warmer — preserve Anthropic prompt-cache across idle periods independent of heartbeats ## Context On models with `cacheRetention: "long"` (1h TTL), prompt-cache entries die after 1h of non-access. For agents that see bursty user interaction, this means cache entries built up during active conversation ages out during quiet periods and the next user turn pays full prefix rewrite. Heartbeats were likely *intended* to keep cache warm, but as filed in the companion issue, heartbeat runs today build a separate cache chain (due to tool-set and system-prompt divergence) that doesn't refresh the conversation chain's TTL. Even once that's fixed, heartbeats have other semantics — user-visible reply guards (`HEARTBEAT_OK`), HEARTBEAT.md reading logic, delivery routing — that make them a noisy tool for a pure cache-keep-alive use case. ## Proposal Add a dedicated **Cache TTL Warmer** subsystem: a lightweight background task that periodically sends the minimum request necessary to refresh Anthropic's cache TTL for each agent's active prefix, without participating in the conversation semantics. ### Shape - Per-agent, opt-in via config: ```yaml agents: defaults: cacheWarmer: enabled: true interval: "50m" # < any cacheRetention TTL in use maxTokens: 1 # keep response tiny list: - id: main cacheWarmer: enabled: true ``` - Fires on a fixed interval (configurable; default `interval < ttl`). - Builds the **exact same request shape** the agent's next conversation turn would build: same `system`, `tools`, and `messages` array through the current cache-control boundary. The only difference is the appended user message, which is a minimal marker (e.g. `"PING"`) and `ma

openclaw2026-04-23 01:01:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70418•Fetched 2026-04-23 07:25:00

View on GitHub

Comments

Participants

Timeline

Reactions

Author

brianthinks

Participants

brianthinks

MaiHHConnect

Timeline (top)

commented ×2cross-referenced ×2

On models with cacheRetention: "long" (1h TTL), prompt-cache entries die after 1h of non-access. For agents that see bursty user interaction, this means cache entries built up during active conversation ages out during quiet periods and the next user turn pays full prefix rewrite.

Heartbeats were likely intended to keep cache warm, but as filed in the companion issue, heartbeat runs today build a separate cache chain (due to tool-set and system-prompt divergence) that doesn't refresh the conversation chain's TTL. Even once that's fixed, heartbeats have other semantics — user-visible reply guards (HEARTBEAT_OK), HEARTBEAT.md reading logic, delivery routing — that make them a noisy tool for a pure cache-keep-alive use case.

Root Cause

Fix Action

Fix / Workaround

Heartbeats remain available for their other purposes (periodic task dispatch, due-date reminders, ambient monitoring). The two features don't overlap.

PR fix notes

PR #70602: fix(heartbeat): keep full tool array during heartbeat runs

Repository: openclaw/openclaw
Author: chinar-amrutkar
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/70602

Description (problem / solution / changelog)

Summary

Fixes heartbeat runs defeating their own cache-keep-alive purpose by restoring the tools-prefix cache alignment with conversation turns.

Problem

Heartbeat runs (where senderIsOwner=false) were filtering out owner-restricted tools (gateway, cron, nodes) entirely from the tools array, causing the tools-prefix hash to differ from conversation runs. This broke Anthropic's prompt cache — every heartbeat paid full price for a new cache chain instead of refreshing the conversation's TTL.

Solution

Instead of removing owner-only tools during heartbeat, keep them in the tool list with runtime guards that throw if invoked. This preserves the tools-prefix cache hash while maintaining the safety property that heartbeat runs cannot actually use privileged tools.

Changes

src/agents/tool-policy.ts — Added optional keepOwnerTools flag to applyOwnerOnlyToolPolicy. When true, owner-only tools are kept (wrapped with guard) instead of being removed.
src/agents/pi-embedded-runner/effective-tool-policy.ts — Detect heartbeat runs via bootstrapContextRunKind === "heartbeat" and pass keepOwnerTools: true.
src/agents/pi-embedded-runner/run/attempt.ts — Forward bootstrapContextRunKind to applyFinalEffectiveToolPolicy.
src/agents/tool-policy.test.ts — Added regression test for keepOwnerTools behavior.

Testing

applyOwnerOnlyToolPolicy(tools, false, { keepOwnerTools: true }) → owner tools are kept (wrapped), not removed
applyOwnerOnlyToolPolicy(tools, false) → unchanged (owner tools removed for non-owner, as before)

Closes #70417
Related: #70418 (orthogonal cache-warmer proposal, independent of heartbeats)

Changed files

src/agents/pi-embedded-runner/effective-tool-policy.ts (modified, +2/-0)
src/agents/pi-embedded-runner/run/attempt.ts (modified, +1/-0)
src/agents/tool-policy.test.ts (modified, +13/-0)
src/agents/tool-policy.ts (modified, +15/-5)

Code Example

agents:
    defaults:
      cacheWarmer:
        enabled: true
        interval: "50m"   # < any cacheRetention TTL in use
        maxTokens: 1      # keep response tiny
    list:
      - id: main
        cacheWarmer:
          enabled: true

RAW_BUFFERClick to expand / collapse

Proposal: Cache TTL Warmer — preserve Anthropic prompt-cache across idle periods independent of heartbeats

Context

Proposal

Add a dedicated Cache TTL Warmer subsystem: a lightweight background task that periodically sends the minimum request necessary to refresh Anthropic's cache TTL for each agent's active prefix, without participating in the conversation semantics.

Shape

Per-agent, opt-in via config:

agents:
  defaults:
    cacheWarmer:
      enabled: true
      interval: "50m"   # < any cacheRetention TTL in use
      maxTokens: 1      # keep response tiny
  list:
    - id: main
      cacheWarmer:
        enabled: true

Fires on a fixed interval (configurable; default interval < ttl).
Builds the exact same request shape the agent's next conversation turn would build: same system, tools, and messages array through the current cache-control boundary. The only difference is the appended user message, which is a minimal marker (e.g. "PING") and max_tokens: 1.
Discards the response (after consuming the minimal output to finalize the stream).
Never persists to session state, never delivers to any channel.

What this buys

Cache TTL refreshed at every marker position: the request reads all live entries at marker #1 / #2 / etc. in Anthropic's cache, which resets their TTL clocks.
Deterministic cost: cache_read on the stable prefix + minimal cache_write for the PING + 1 output token. On a 100K-token cached prefix:
- cache_read: 100K × $0.50/M = $0.05 per refresh
- cache_write: ~5 tokens × $10/M ≈ $0.00005 (negligible)
- output: 1 × $25/M ≈ $0.000025 (negligible)
- Total: ~$0.05 per refresh, ~$1.30/day/agent at 50min interval
Alternative cost (no warmer, cache expires twice a day and the next user turn pays to rewrite):
- 100K × $10/M × 2 = $2/day/agent of avoidable cache-write spend
Net saving: ~$0.70/day/agent in the above scenario, scaling linearly with prefix size.

Orthogonal to heartbeats

The warmer is purely about cache TTL. It does not:

Run user-visible logic (no reading HEARTBEAT.md, no reply delivery)
Affect session state, memory, message history
Need channel-specific prompt content — it uses whatever the current session would use for a real turn

Heartbeats remain available for their other purposes (periodic task dispatch, due-date reminders, ambient monitoring). The two features don't overlap.

Why not just fix heartbeats to do this?

The companion issue (heartbeat cache divergence) is necessary regardless. But even once heartbeats properly refresh conversation cache, they:

Run every configured interval even when the agent is actively being used (wasteful — active sessions refresh cache on their own).
Consume an agent turn that might trigger HEARTBEAT_OK delivery, logging, etc.
Tie cache-keep-alive to the heartbeat's cadence, which is tuned for task-surface needs, not TTL math.

A dedicated warmer:

Can be disabled during active sessions (e.g. skip firing if last user turn < 5 min ago).
Has a single, narrow purpose with predictable cost.
Doesn't force users to configure heartbeats they don't otherwise need.

Open design questions

Idle-skip heuristic: warmer skips firing when the session had a non-warmer request in the last N minutes (N < TTL). Suggested default: skip if any request in the last 5 minutes.
Marker message shape: a single {role: "user", content: "PING"} at the end? A no-content synthetic turn? Needs validation that Anthropic accepts minimal completions.
Max-tokens bound: max_tokens: 1 keeps output cost near zero, but forces a truncated response. Some harnesses may handle max_tokens=0 or max_tokens=1 + stop differently. Worth testing empirically.
Scope — entries past the first cache_control marker: does the warmer need to include the full message history, or is refreshing only the system-prompt cache sufficient? The latter is simpler but less effective if the session's most expensive cached entries are message-history entries.
Interaction with compaction / session reset: when a compaction happens, the warmer should either pause until the next real turn rebuilds cache, or fire immediately to warm the new prefix. Probably the former.
Failure modes: warmer requests should silently fail on API errors (rate limits, 4xx on malformed minimal requests) without escalating — they're fire-and-forget maintenance.

Scope of this issue

This is a design discussion, not a PR. Looking for maintainer reaction to:

Whether the warmer is a worthwhile addition to openclaw given heartbeats already exist.
Whether the implementation lives inside the heartbeat subsystem (as a new mode) or as a distinct subsystem.
Preferences on the config surface.
Answers to the open design questions.

Happy to prototype + PR once direction is agreed.

Companion issue: heartbeat cache divergence (link when filed).
Prior closed issue #16076: tool-search defer_loading support — another lever for cache stability, orthogonal to the warmer.

Proposed with design input from Claude.

extent analysis

TL;DR

Implement a dedicated Cache TTL Warmer subsystem to periodically refresh Anthropic's cache TTL for each agent's active prefix, reducing avoidable cache-write spend.

Guidance

Consider adding a lightweight background task that sends minimal requests to refresh the cache TTL, without participating in conversation semantics.
Evaluate the proposed configuration options, such as enabled, interval, and maxTokens, to determine the best approach for your use case.
Investigate the open design questions, including idle-skip heuristic, marker message shape, and max-tokens bound, to ensure the warmer is effective and efficient.
Assess the potential interaction with compaction and session reset to determine the best course of action when these events occur.

Example

agents:
  defaults:
    cacheWarmer:
      enabled: true
      interval: "50m"
      maxTokens: 1

This example configuration enables the cache warmer with a 50-minute interval and a maximum of 1 token.

Notes

The proposed Cache TTL Warmer subsystem is designed to be orthogonal to heartbeats, which have other semantics and purposes. The warmer's implementation should be evaluated in the context of the existing heartbeat subsystem and the overall architecture of the system.

Recommendation

Apply the proposed Cache TTL Warmer workaround to reduce avoidable cache-write spend and improve the overall efficiency of the system. This approach allows for a dedicated and lightweight solution that can be tailored to the specific needs of the system, without relying on the heartbeat subsystem.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #environment setup #docker error #permission error #memory optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.