hermes - ✅(Solved) Fix Feature request: make prompt cache TTL configurable (5m vs 1h) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14971Fetched 2026-04-24 10:43:54
View on GitHub
Comments
1
Participants
2
Timeline
9
Reactions
0
Timeline (top)
labeled ×5closed ×1commented ×1cross-referenced ×1

Hermes applies Anthropic prompt caching via agent/prompt_caching.py::apply_anthropic_cache_control, which accepts cache_ttl="5m" or "1h". The function itself supports both values.

However, run_agent.py hardcodes the TTL at "5m":

https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L1010

self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)

There is no way to opt into the 1h TTL without patching source.

Error Message

def _resolve_cache_ttl(self) -> str: try: from cli import load_cli_config cfg = load_cli_config() or {} val = str((cfg.get("model") or {}).get("prompt_cache_ttl") or "").strip().lower() if val in ("5m", "1h"): return val if val: logger.warning(f"Invalid model.prompt_cache_ttl={val!r}; falling back to '5m'") except Exception as e: logger.debug(f"_resolve_cache_ttl lookup failed: {e}") return "5m"

Root Cause

Hermes applies Anthropic prompt caching via agent/prompt_caching.py::apply_anthropic_cache_control, which accepts cache_ttl="5m" or "1h". The function itself supports both values.

However, run_agent.py hardcodes the TTL at "5m":

https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L1010

self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)

There is no way to opt into the 1h TTL without patching source.

Fix Action

Fix / Workaround

There is no way to opt into the 1h TTL without patching source.

Minimal patch (tested locally on v0.10.0):

PR fix notes

PR #15065: feat: configurable prompt_caching.cache_ttl (5m default, 1h opt-in) — salvage #12659

Description (problem / solution / changelog)

Salvage of #12659 by @ericnicolaides onto current main. Chosen over parallel PRs #14812 (env-var only, violates '.env for secrets only') and #3082 (threads kwarg through gateway + session + state DB — unnecessarily heavy for a 2-value config).

Closes #14971.

What this PR does

Exposes Anthropic's 1h prompt-cache TTL tier as a config option. Default remains 5m — zero behavior change for existing users.

# ~/.hermes/config.yaml
prompt_caching:
  cache_ttl: "1h"   # opt-in; default is "5m"

Motivated by #14971 cost data: on a $246 Anthropic workload, 56.5% of spend was input_cache_write_5m — the cache was rewriting more than reading because the 5m TTL kept expiring between turns. The 1h tier costs 2x on write (vs 1.25x for 5m) but amortizes across a full session. The 1h path has been supported inside apply_anthropic_cache_control() since it landed; this PR just exposes it.

How

AIAgent.__init__ reads prompt_caching.cache_ttl via hermes_cli.config.load_config(). Validates against {"5m", "1h"}; anything else falls back to "5m" without raising. DEFAULT_CONFIG gets a prompt_caching section so the merge resolves cleanly for upgraders. Same conflict resolution applied to match current main's _anthropic_prompt_cache_policy() helper refactor.

Changes

  • @ericnicolaides (#12659 commit 1): config wiring + run_agent.py lookup + 3 unit tests + developer guide fix
  • @ericnicolaides (#12659 commit 2): cli-config.yaml.example documentation block
  • Follow-up: AUTHOR_MAP entry for [email protected] (Cursor agent commit email) → @ericnicolaides

Validation

ConfigExpected _cache_ttlGot
(no prompt_caching section)5m5m
cache_ttl: "1h"1h1h
cache_ttl: "30m" (invalid)5m (fallback)5m
cache_ttl: "5m" (explicit)5m5m
prompt_caching: {} (empty)5m5m
  • tests/run_agent/test_run_agent.py -k cache_ttl — 3/3 pass (new tests)
  • tests/agent/test_prompt_caching.py — 14/14 pass (regression guard)
  • E2E: 5 scenarios above verified with real config.yaml + AIAgent instantiation

Co-authored-by: @ericnicolaides

Changed files

  • cli-config.yaml.example (modified, +10/-0)
  • hermes_cli/config.py (modified, +6/-0)
  • run_agent.py (modified, +15/-2)
  • scripts/release.py (modified, +1/-0)
  • tests/run_agent/test_run_agent.py (modified, +60/-0)
  • website/docs/developer-guide/context-compression-and-caching.md (modified, +3/-3)

Code Example

self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)

---

model:
  name: claude-opus-4-7
  provider: anthropic
  prompt_cache_ttl: '1h'   # new; accepts '5m' (default) or '1h'

---

self._cache_ttl = self._resolve_cache_ttl()

---

def _resolve_cache_ttl(self) -> str:
       try:
           from cli import load_cli_config
           cfg = load_cli_config() or {}
           val = str((cfg.get("model") or {}).get("prompt_cache_ttl") or "").strip().lower()
           if val in ("5m", "1h"):
               return val
           if val:
               logger.warning(f"Invalid model.prompt_cache_ttl={val!r}; falling back to '5m'")
       except Exception as e:
           logger.debug(f"_resolve_cache_ttl lookup failed: {e}")
       return "5m"
RAW_BUFFERClick to expand / collapse

Context

Hermes applies Anthropic prompt caching via agent/prompt_caching.py::apply_anthropic_cache_control, which accepts cache_ttl="5m" or "1h". The function itself supports both values.

However, run_agent.py hardcodes the TTL at "5m":

https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L1010

self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)

There is no way to opt into the 1h TTL without patching source.

Why this matters (real data)

On a real multi-session workload over 3 days ($246.46 total spend via Anthropic API), cost breakdown by token type:

token_typecostshare
input_cache_write_5m$139.3756.5%
input_cache_read$84.6134.3%
output$18.377.5%
input_no_cache$4.111.7%

Write-to-read ratio is 1.64 — i.e. the cache was rewritten more than it was read. This indicates the 5m TTL is expiring between turns for any workflow where the user pauses >5 minutes (switching tasks, reading the previous response, thinking, interruptions, multi-hour sessions with gaps).

Anthropic's 1h TTL costs 2x on write (vs 1.25x for 5m) but amortizes across a full session. Rough estimate on the same workload, assuming 60% of writes become reads under 1h: saves ~$400–850/month for an active user.

Proposed fix

Expose the TTL via config.yaml. Example:

model:
  name: claude-opus-4-7
  provider: anthropic
  prompt_cache_ttl: '1h'   # new; accepts '5m' (default) or '1h'

Minimal patch (tested locally on v0.10.0):

  1. Replace line 1010 with a config lookup:

    self._cache_ttl = self._resolve_cache_ttl()
  2. Add a helper near _anthropic_prompt_cache_policy:

    def _resolve_cache_ttl(self) -> str:
        try:
            from cli import load_cli_config
            cfg = load_cli_config() or {}
            val = str((cfg.get("model") or {}).get("prompt_cache_ttl") or "").strip().lower()
            if val in ("5m", "1h"):
                return val
            if val:
                logger.warning(f"Invalid model.prompt_cache_ttl={val!r}; falling back to '5m'")
        except Exception as e:
            logger.debug(f"_resolve_cache_ttl lookup failed: {e}")
        return "5m"

Default stays "5m" so no behavior change for existing users.

Willing to PR

Happy to send a PR with this change + a unit test hitting both branches if maintainers think this is the right shape. Any preferred config key naming is fine — prompt_cache_ttl under model: felt natural but caching.ttl / anthropic.cache_ttl would also work.

Environment

  • Hermes v0.10.0
  • Anthropic provider, Claude Opus 4.7 main + Sonnet 4.6 fallback
  • WSL2 Ubuntu 24.04

extent analysis

TL;DR

To address the issue of excessive cache rewrites due to the hardcoded 5-minute TTL, expose the TTL via config.yaml and allow users to opt into a 1-hour TTL.

Guidance

  • The proposed fix involves modifying run_agent.py to read the TTL from config.yaml instead of hardcoding it, which can help reduce cache rewrites and costs.
  • To implement this fix, replace the hardcoded TTL line with a config lookup and add a helper function to resolve the cache TTL from the config file.
  • The prompt_cache_ttl key in config.yaml should accept either '5m' or '1h' as valid values.
  • Before implementing the fix, consider testing the proposed patch locally to ensure it works as expected.

Example

def _resolve_cache_ttl(self) -> str:
    try:
        from cli import load_cli_config
        cfg = load_cli_config() or {}
        val = str((cfg.get("model") or {}).get("prompt_cache_ttl") or "").strip().lower()
        if val in ("5m", "1h"):
            return val
        if val:
            logger.warning(f"Invalid model.prompt_cache_ttl={val!r}; falling back to '5m'")
    except Exception as e:
        logger.debug(f"_resolve_cache_ttl lookup failed: {e}")
    return "5m"

Notes

The proposed fix assumes that the config.yaml file is properly loaded and parsed, and that the prompt_cache_ttl key is correctly configured. Additionally, the fix only addresses the issue for users of the Anthropic provider and may not be applicable to other providers.

Recommendation

Apply the proposed workaround by exposing the TTL via config.yaml and allowing users to opt into a 1-hour TTL, as it has the potential to significantly reduce cache rewrites and costs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Feature request: make prompt cache TTL configurable (5m vs 1h) [1 pull requests, 1 comments, 2 participants]