hermes - ✅(Solved) Fix Feature request: make prompt cache TTL configurable (5m vs 1h) [1 pull requests, 1 comments, 2 participants]

hermes2026-04-24 07:37:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#14971•Fetched 2026-04-24 10:43:54

View on GitHub

Comments

Participants

Timeline

Reactions

Author

futureworld678

Participants

alt-glitch

futureworld678

Timeline (top)

labeled ×5closed ×1commented ×1cross-referenced ×1

Hermes applies Anthropic prompt caching via agent/prompt_caching.py::apply_anthropic_cache_control, which accepts cache_ttl="5m" or "1h". The function itself supports both values.

However, run_agent.py hardcodes the TTL at "5m":

https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L1010

self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)

There is no way to opt into the 1h TTL without patching source.

Error Message

def _resolve_cache_ttl(self) -> str: try: from cli import load_cli_config cfg = load_cli_config() or {} val = str((cfg.get("model") or {}).get("prompt_cache_ttl") or "").strip().lower() if val in ("5m", "1h"): return val if val: logger.warning(f"Invalid model.prompt_cache_ttl={val!r}; falling back to '5m'") except Exception as e: logger.debug(f"_resolve_cache_ttl lookup failed: {e}") return "5m"

Root Cause

Hermes applies Anthropic prompt caching via agent/prompt_caching.py::apply_anthropic_cache_control, which accepts cache_ttl="5m" or "1h". The function itself supports both values.

However, run_agent.py hardcodes the TTL at "5m":

https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L1010

self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)

There is no way to opt into the 1h TTL without patching source.

Fix Action

Fix / Workaround

There is no way to opt into the 1h TTL without patching source.

Minimal patch (tested locally on v0.10.0):

PR fix notes

PR #15065: feat: configurable prompt_caching.cache_ttl (5m default, 1h opt-in) — salvage #12659

Repository: NousResearch/hermes-agent
Author: teknium1
State: closed | merged: True
Link: https://github.com/NousResearch/hermes-agent/pull/15065

Description (problem / solution / changelog)

Salvage of #12659 by @ericnicolaides onto current main. Chosen over parallel PRs #14812 (env-var only, violates '.env for secrets only') and #3082 (threads kwarg through gateway + session + state DB — unnecessarily heavy for a 2-value config).

Closes #14971.

What this PR does

Exposes Anthropic's 1h prompt-cache TTL tier as a config option. Default remains 5m — zero behavior change for existing users.

# ~/.hermes/config.yaml
prompt_caching:
  cache_ttl: "1h"   # opt-in; default is "5m"

Motivated by #14971 cost data: on a $246 Anthropic workload, 56.5% of spend was input_cache_write_5m — the cache was rewriting more than reading because the 5m TTL kept expiring between turns. The 1h tier costs 2x on write (vs 1.25x for 5m) but amortizes across a full session. The 1h path has been supported inside apply_anthropic_cache_control() since it landed; this PR just exposes it.

How

AIAgent.__init__ reads prompt_caching.cache_ttl via hermes_cli.config.load_config(). Validates against {"5m", "1h"}; anything else falls back to "5m" without raising. DEFAULT_CONFIG gets a prompt_caching section so the merge resolves cleanly for upgraders. Same conflict resolution applied to match current main's _anthropic_prompt_cache_policy() helper refactor.

Changes

@ericnicolaides (#12659 commit 1): config wiring + run_agent.py lookup + 3 unit tests + developer guide fix
@ericnicolaides (#12659 commit 2): cli-config.yaml.example documentation block
Follow-up: AUTHOR_MAP entry for [email protected] (Cursor agent commit email) → @ericnicolaides

Validation

Config	Expected `_cache_ttl`	Got
(no `prompt_caching` section)	`5m`	`5m` ✓
`cache_ttl: "1h"`	`1h`	`1h` ✓
`cache_ttl: "30m"` (invalid)	`5m` (fallback)	`5m` ✓
`cache_ttl: "5m"` (explicit)	`5m`	`5m` ✓
`prompt_caching: {}` (empty)	`5m`	`5m` ✓

tests/run_agent/test_run_agent.py -k cache_ttl — 3/3 pass (new tests)
tests/agent/test_prompt_caching.py — 14/14 pass (regression guard)
E2E: 5 scenarios above verified with real config.yaml + AIAgent instantiation

Co-authored-by: @ericnicolaides

Changed files

cli-config.yaml.example (modified, +10/-0)
hermes_cli/config.py (modified, +6/-0)
run_agent.py (modified, +15/-2)
scripts/release.py (modified, +1/-0)
tests/run_agent/test_run_agent.py (modified, +60/-0)
website/docs/developer-guide/context-compression-and-caching.md (modified, +3/-3)

Code Example

self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)

---

model:
  name: claude-opus-4-7
  provider: anthropic
  prompt_cache_ttl: '1h'   # new; accepts '5m' (default) or '1h'

---

self._cache_ttl = self._resolve_cache_ttl()

---

def _resolve_cache_ttl(self) -> str:
       try:
           from cli import load_cli_config
           cfg = load_cli_config() or {}
           val = str((cfg.get("model") or {}).get("prompt_cache_ttl") or "").strip().lower()
           if val in ("5m", "1h"):
               return val
           if val:
               logger.warning(f"Invalid model.prompt_cache_ttl={val!r}; falling back to '5m'")
       except Exception as e:
           logger.debug(f"_resolve_cache_ttl lookup failed: {e}")
       return "5m"

RAW_BUFFERClick to expand / collapse

Context

Hermes applies Anthropic prompt caching via agent/prompt_caching.py::apply_anthropic_cache_control, which accepts cache_ttl="5m" or "1h". The function itself supports both values.

However, run_agent.py hardcodes the TTL at "5m":

https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L1010

self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)

There is no way to opt into the 1h TTL without patching source.

Why this matters (real data)

On a real multi-session workload over 3 days ($246.46 total spend via Anthropic API), cost breakdown by token type:

token_type	cost	share
`input_cache_write_5m`	$139.37	56.5%
`input_cache_read`	$84.61	34.3%
`output`	$18.37	7.5%
`input_no_cache`	$4.11	1.7%

Write-to-read ratio is 1.64 — i.e. the cache was rewritten more than it was read. This indicates the 5m TTL is expiring between turns for any workflow where the user pauses >5 minutes (switching tasks, reading the previous response, thinking, interruptions, multi-hour sessions with gaps).

Anthropic's 1h TTL costs 2x on write (vs 1.25x for 5m) but amortizes across a full session. Rough estimate on the same workload, assuming 60% of writes become reads under 1h: saves ~$400–850/month for an active user.

Proposed fix

Expose the TTL via config.yaml. Example:

model:
  name: claude-opus-4-7
  provider: anthropic
  prompt_cache_ttl: '1h'   # new; accepts '5m' (default) or '1h'

Minimal patch (tested locally on v0.10.0):

Replace line 1010 with a config lookup:

self._cache_ttl = self._resolve_cache_ttl()

Add a helper near _anthropic_prompt_cache_policy:

def _resolve_cache_ttl(self) -> str:
    try:
        from cli import load_cli_config
        cfg = load_cli_config() or {}
        val = str((cfg.get("model") or {}).get("prompt_cache_ttl") or "").strip().lower()
        if val in ("5m", "1h"):
            return val
        if val:
            logger.warning(f"Invalid model.prompt_cache_ttl={val!r}; falling back to '5m'")
    except Exception as e:
        logger.debug(f"_resolve_cache_ttl lookup failed: {e}")
    return "5m"

Default stays "5m" so no behavior change for existing users.

Willing to PR

Happy to send a PR with this change + a unit test hitting both branches if maintainers think this is the right shape. Any preferred config key naming is fine — prompt_cache_ttl under model: felt natural but caching.ttl / anthropic.cache_ttl would also work.

Environment

Hermes v0.10.0
Anthropic provider, Claude Opus 4.7 main + Sonnet 4.6 fallback
WSL2 Ubuntu 24.04

extent analysis

TL;DR

To address the issue of excessive cache rewrites due to the hardcoded 5-minute TTL, expose the TTL via config.yaml and allow users to opt into a 1-hour TTL.

Guidance

The proposed fix involves modifying run_agent.py to read the TTL from config.yaml instead of hardcoding it, which can help reduce cache rewrites and costs.
To implement this fix, replace the hardcoded TTL line with a config lookup and add a helper function to resolve the cache TTL from the config file.
The prompt_cache_ttl key in config.yaml should accept either '5m' or '1h' as valid values.
Before implementing the fix, consider testing the proposed patch locally to ensure it works as expected.

Example

def _resolve_cache_ttl(self) -> str:
    try:
        from cli import load_cli_config
        cfg = load_cli_config() or {}
        val = str((cfg.get("model") or {}).get("prompt_cache_ttl") or "").strip().lower()
        if val in ("5m", "1h"):
            return val
        if val:
            logger.warning(f"Invalid model.prompt_cache_ttl={val!r}; falling back to '5m'")
    except Exception as e:
        logger.debug(f"_resolve_cache_ttl lookup failed: {e}")
    return "5m"

Notes

The proposed fix assumes that the config.yaml file is properly loaded and parsed, and that the prompt_cache_ttl key is correctly configured. Additionally, the fix only addresses the issue for users of the Anthropic provider and may not be applicable to other providers.

Recommendation

Apply the proposed workaround by exposing the TTL via config.yaml and allowing users to opt into a 1-hour TTL, as it has the potential to significantly reduce cache rewrites and costs.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #mixed precision #training loop #device allocation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix Feature request: make prompt cache TTL configurable (5m vs 1h) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #15065: feat: configurable prompt_caching.cache_ttl (5m default, 1h opt-in) — salvage #12659

Description (problem / solution / changelog)

What this PR does

How

Changes

Validation

Changed files

Code Example

Context

Why this matters (real data)

Proposed fix

Willing to PR

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix Feature request: make prompt cache TTL configurable (5m vs 1h) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #15065: feat: configurable prompt_caching.cache_ttl (5m default, 1h opt-in) — salvage #12659

Description (problem / solution / changelog)

What this PR does

How

Changes

Validation

Changed files

Code Example

Context

Why this matters (real data)

Proposed fix

Willing to PR

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING