hermes - 💡(How to fix) Fix [Bug]: LM Studio custom_providers per-model context_length broken in 0.14.0 — regressed to 64K

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Code Example

https://paste.rs/4YutI
https://paste.rs/q99Vj

---

Could not detect context length for model 'nvidia/nemotron-3-nano-4b' at http://127.0.0.1:1234/v1 — defaulting to 256,000 tokens (probe-down)
RAW_BUFFERClick to expand / collapse

Bug Description

Per-model context_length under custom_providers is completely ignored when using LM Studio as a provider. This affects ALL models. After updating to 0.14.0 it regressed — previously showing 256K, now every single model shows 64K regardless of what context_length is set in config.yaml.

Config example: custom_providers:

  • name: lmstudio-qwen3.6-35b-a3b base_url: http://localhost:1234/v1 model: qwen3.6-35b-a3b models: qwen3.6-35b-a3b: context_length: 262144

  • name: lmstudio-nemotron-3-nano-4b base_url: http://localhost:1234/v1 model: nemotron-3-nano-4b models: nemotron-3-nano-4b: context_length: 1048576

Both and all other configured models show 64K. Every single model shows 64K.

(note: it even before the updates, it was wrong, but it was definetly better)

(and in the logs "Could not detect context length for model 'nvidia/nemotron-3-nano-4b' at http://127.0.0.1:1234/v1 — defaulting to 256,000 tokens (probe-down). Set model.context_length in config.yaml to override." can be seen)

Steps to Reproduce

  1. Add custom_providers to ~/.hermes/config.yaml with per-model context_length for LM Studio models
  2. Run hermes chat
  3. Check context shown in the status bar at the bottom

Expected Behavior

Each model should use the context_length defined for it in custom_providers. Examples:

  • qwen3.6-35b-a3b → 262,144 tokens
  • glm-4.7-flash → 202,752 tokens
  • nemotron-3-nano-4b → 1,048,576 tokens
  • gemma-3-1b → 32,768 tokens

Actual Behavior

Every single model shows 64,000 tokens regardless of what's configured. The configured context_length values are completely ignored.

Affected Component

Configuration (config.yaml, .env, hermes setup)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

https://paste.rs/4YutI
https://paste.rs/q99Vj

Operating System

windows 11

Python Version

Python 3.14.0

Hermes Version

0.14.0 (2026.5.16)

Additional Logs / Traceback (optional)

Could not detect context length for model 'nvidia/nemotron-3-nano-4b' at http://127.0.0.1:1234/v1 — defaulting to 256,000 tokens (probe-down)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: LM Studio custom_providers per-model context_length broken in 0.14.0 — regressed to 64K