hermes - 💡(How to fix) Fix OpenRouter Grok prompt caching likely misses xAI server-affinity header [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When using Grok models through OpenRouter, Hermes appears to miss xAI's cache-affinity requirement, which likely causes poor prompt-cache hit rates and higher token costs.

Root Cause

xAI prompt caching is unusually sensitive to server affinity. For repeated requests to hit the same cache, xAI expects a stable conversation identifier to be sent (for chat-completions-style calls, this is typically the x-grok-conv-id header; for Responses-style flows, prompt_cache_key is used).

Without that affinity signal, requests can be routed to different backend servers, causing frequent cache misses even when the prompt prefix is stable.

Fix Action

Fixed

Code Example

extra_headers = {"x-grok-conv-id": session_id}
RAW_BUFFERClick to expand / collapse

Summary

When using Grok models through OpenRouter, Hermes appears to miss xAI's cache-affinity requirement, which likely causes poor prompt-cache hit rates and higher token costs.

Why this matters

xAI prompt caching is unusually sensitive to server affinity. For repeated requests to hit the same cache, xAI expects a stable conversation identifier to be sent (for chat-completions-style calls, this is typically the x-grok-conv-id header; for Responses-style flows, prompt_cache_key is used).

Without that affinity signal, requests can be routed to different backend servers, causing frequent cache misses even when the prompt prefix is stable.

Current Hermes behavior

From the current call path:

  • run_agent.py
  • agent/transports/chat_completions.py
  • plugins/model-providers/openrouter/__init__.py

Hermes already has a stable session_id, but on the OpenRouter chat completions path it does not appear to be used for Grok cache affinity.

More specifically:

  1. agent/transports/chat_completions.py calls profile.build_api_kwargs_extras(...)
  2. that path does not appear to propagate session_id into the provider extras context
  3. plugins/model-providers/openrouter/__init__.py therefore has no way to derive and attach an OpenRouter/xAI-specific affinity header for Grok models

There is related logic in the Responses/Codex path (prompt_cache_key = session_id), but that does not help the OpenRouter chat-completions path used for x-ai/grok-* models.

Suspected result

For OpenRouter Grok usage, Hermes likely sends repeated requests without x-grok-conv-id, so xAI server affinity is lost and prompt caching underperforms.

Suggested fix

1) Pass session_id through the OpenRouter chat-completions provider hook

In agent/transports/chat_completions.py, include session_id in the context passed to:

  • profile.build_api_kwargs_extras(...)

2) Add Grok-specific affinity logic in the OpenRouter provider profile

In plugins/model-providers/openrouter/__init__.py, when:

  • provider is OpenRouter
  • model is x-ai/grok-* (and possibly xai/grok-*)
  • session_id is present

attach:

extra_headers = {"x-grok-conv-id": session_id}

as top-level request kwargs.

3) Preserve provider-added extra_headers when request overrides are present

After digging further, there appears to be a second issue in the same path:

  • even if the OpenRouter profile returns extra_headers, the final request assembly can still lose them if request_overrides.extra_headers is applied with last-write-wins semantics

So this likely needs a small merge fix in agent/transports/chat_completions.py:

  • merge provider-generated extra_headers with user-supplied request_overrides.extra_headers
  • do not clobber the provider header when overrides are present

Without this second fix, adding x-grok-conv-id in the provider profile may still not survive into the final request kwargs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING