hermes - 💡(How to fix) Fix OpenRouter Grok prompt caching likely misses xAI server-affinity header [2 pull requests]

hermes2026-05-09 16:47:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When using Grok models through OpenRouter, Hermes appears to miss xAI's cache-affinity requirement, which likely causes poor prompt-cache hit rates and higher token costs.

Root Cause

xAI prompt caching is unusually sensitive to server affinity. For repeated requests to hit the same cache, xAI expects a stable conversation identifier to be sent (for chat-completions-style calls, this is typically the x-grok-conv-id header; for Responses-style flows, prompt_cache_key is used).

Without that affinity signal, requests can be routed to different backend servers, causing frequent cache misses even when the prompt prefix is stable.

Fix Action

Fixed

Fixed by PR: fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (https://github.com/NousResearch/hermes-agent/pull/22708)
Fixed by PR: fix(openrouter): add x-grok-conv-id header for Grok models (carve-out of #22708) (https://github.com/NousResearch/hermes-agent/pull/22809)

Code Example

extra_headers = {"x-grok-conv-id": session_id}

RAW_BUFFERClick to expand / collapse

Summary

When using Grok models through OpenRouter, Hermes appears to miss xAI's cache-affinity requirement, which likely causes poor prompt-cache hit rates and higher token costs.

Why this matters

Without that affinity signal, requests can be routed to different backend servers, causing frequent cache misses even when the prompt prefix is stable.

Current Hermes behavior

From the current call path:

run_agent.py
agent/transports/chat_completions.py
plugins/model-providers/openrouter/__init__.py

Hermes already has a stable session_id, but on the OpenRouter chat completions path it does not appear to be used for Grok cache affinity.

More specifically:

agent/transports/chat_completions.py calls profile.build_api_kwargs_extras(...)
that path does not appear to propagate session_id into the provider extras context
plugins/model-providers/openrouter/__init__.py therefore has no way to derive and attach an OpenRouter/xAI-specific affinity header for Grok models

There is related logic in the Responses/Codex path (prompt_cache_key = session_id), but that does not help the OpenRouter chat-completions path used for x-ai/grok-* models.

Suspected result

For OpenRouter Grok usage, Hermes likely sends repeated requests without x-grok-conv-id, so xAI server affinity is lost and prompt caching underperforms.

Suggested fix

1) Pass `session_id` through the OpenRouter chat-completions provider hook

In agent/transports/chat_completions.py, include session_id in the context passed to:

profile.build_api_kwargs_extras(...)

2) Add Grok-specific affinity logic in the OpenRouter provider profile

In plugins/model-providers/openrouter/__init__.py, when:

provider is OpenRouter
model is x-ai/grok-* (and possibly xai/grok-*)
session_id is present

attach:

extra_headers = {"x-grok-conv-id": session_id}

as top-level request kwargs.

3) Preserve provider-added `extra_headers` when request overrides are present

After digging further, there appears to be a second issue in the same path:

even if the OpenRouter profile returns extra_headers, the final request assembly can still lose them if request_overrides.extra_headers is applied with last-write-wins semantics

So this likely needs a small merge fix in agent/transports/chat_completions.py:

merge provider-generated extra_headers with user-supplied request_overrides.extra_headers
do not clobber the provider header when overrides are present

Without this second fix, adding x-grok-conv-id in the provider profile may still not survive into the final request kwargs.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #dependency error #configuration error #environment variable #network issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix OpenRouter Grok prompt caching likely misses xAI server-affinity header [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

Code Example

Summary

Why this matters

Current Hermes behavior

Suspected result

Suggested fix

1) Pass `session_id` through the OpenRouter chat-completions provider hook

2) Add Grok-specific affinity logic in the OpenRouter provider profile

3) Preserve provider-added `extra_headers` when request overrides are present

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix OpenRouter Grok prompt caching likely misses xAI server-affinity header [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

Code Example

Summary

Why this matters

Current Hermes behavior

Suspected result

Suggested fix

1) Pass session_id through the OpenRouter chat-completions provider hook

2) Add Grok-specific affinity logic in the OpenRouter provider profile

3) Preserve provider-added extra_headers when request overrides are present

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1) Pass `session_id` through the OpenRouter chat-completions provider hook

3) Preserve provider-added `extra_headers` when request overrides are present