hermes - 💡(How to fix) Fix [Bug]: Auto-detect context length not working correctly with oMLX

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

In some previous sessions, this caused Hermes to exceed the available context, resulting in the following error: In response to the error, it added a context_size parameter to config.yaml, which defeats the purpose of auto-detect.

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fix / Workaround

Codex patched model_metadata.py in order to implement the fix described here.

Code Example

{
  "id": "MiniMax-M2.7-oQ5",
  "loaded": true,
  "max_context_window": 186000,
  "max_tokens": 32768
}

---

Debug report uploaded:
  Report       https://paste.rs/KHlQB
  agent.log    https://paste.rs/HZx2I
  gateway.log  https://paste.rs/f7UUw

--- hermes dump ---
version:          0.14.0 (2026.5.16) [39c41d0f]
os:               Darwin 24.6.0 arm64
python:           3.11.15
openai_sdk:       2.24.0
profile:          default
hermes_home:      ~/.hermes
model:            MiniMax-M2.7-oQ5
provider:         custom
terminal:         local

---
RAW_BUFFERClick to expand / collapse

Bug Description

Running MiniMax-M2.7 on oMLX, hermes was reporting incorrect context size: ⚕ MiniMax-M2.7-oQ5 │ 59.1K/256K │ Actual context size was 186000, not 256k.

Steps to Reproduce

Run hermes model (cli) and configure a custom endpoint (http://192.168.5.125:8098/v1) running on oMLX. Note that oMLX is running on a separate machine from Hermes Agent (both on the local network).

Run hermes agent (via cli / terminal). Incorrect context size is reported, the default 256k.

Expected Behavior

oMLX does not include context size in the response to the standard /v1/models/ endpoint. The oMLX response: {"object":"list","data":[{"id":"MiniMax-M2.7-4bit-mxfp4","object":"model","created":1779587763,"owned_by":"omlx"},{"id":"MiniMax-M2.7-oQ5","object":"model","created":1779587763,"owned_by":"omlx"},{"id":"Qwen3.6-35B-A3B-MLX-8bit","object":"model","created":1779587763,"owned_by":"omlx"},{"id":"gemma-4-26B-A4B-it-MLX-8bit","object":"model","created":1779587763,"owned_by":"omlx"},{"id":"gemma-4-E4B-it-MLX-6bit","object":"model","created":1779587763,"owned_by":"omlx"}]}

Instead, it reports context in the endpoint /v1/models/status/. For example:

{
  "id": "MiniMax-M2.7-oQ5",
  "loaded": true,
  "max_context_window": 186000,
  "max_tokens": 32768
}

Note that oMLX also requires Bearer auth when an api_token is configured, as in: curl -s -H "Authorization: Bearer api_token" http://192.168.5.125:8098/v1/models/status/

Actual Behavior

The log had the following entry: Could not detect context length ... defaulting to 256,000 tokens ...which matches the DEFAULT_FALLBACK_CONTEXT = 256000 setting

In some previous sessions, this caused Hermes to exceed the available context, resulting in the following error: Prompt too long: 190451 tokens exceeds max context window of 186000 tokens

In response to the error, it added a context_size parameter to config.yaml, which defeats the purpose of auto-detect.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Debug Report

Debug report uploaded:
  Report       https://paste.rs/KHlQB
  agent.log    https://paste.rs/HZx2I
  gateway.log  https://paste.rs/f7UUw

--- hermes dump ---
version:          0.14.0 (2026.5.16) [39c41d0f]
os:               Darwin 24.6.0 arm64
python:           3.11.15
openai_sdk:       2.24.0
profile:          default
hermes_home:      ~/.hermes
model:            MiniMax-M2.7-oQ5
provider:         custom
terminal:         local

Operating System

MacOS (Darwin 24.6.0 arm64)

Python Version

3.11.15

Hermes Version

0.14.0 (2026.5.16) [39c41d0f]

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

  1. When context size is not found in the response to the /v1/models/ endpoint, check for oMLX inference engine as 'owned_by: "omlx"' in the response.
  2. Read model metadata from /v1/models/status when it is not included in the /v1/models/ response and inference engine is oMLX.

Codex patched model_metadata.py in order to implement the fix described here.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING