hermes - 💡(How to fix) Fix [Bug]: custom_providers unstable with Baidu Coding Plan — multi-model picker broken + wrong context lengths causing truncation [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#23318Fetched 2026-05-11 03:30:02
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Timeline (top)
labeled ×4commented ×1

Error Message

Not applicable — hermes debug share captures crashes, tracebacks, and environment details. This bug is a deterministic logic error in the model resolution code path (wrong values in DEFAULT_CONTEXT_LENGTHS fuzzy matching + picker flattening). No crash occurs; the symptoms are silent (wrong context_length= logged, missing picker entries).

The Root Cause Analysis section below provides the equivalent diagnostic: exact file paths, line numbers, and the resolution chain steps involved.

If a debug share is required to validate the fix, I can provide one after the native provider PR is submitted.

Root Cause

The Root Cause Analysis section below provides the equivalent diagnostic: exact file paths, line numbers, and the resolution chain steps involved.

Fix Action

Fix / Workaround

  1. Multi-model picker broken — Defining multiple models: under a single custom provider entry breaks the /model picker; it only surfaces one model (#20582). The workaround (one list entry per model) means 7 duplicate base_url + api_key blocks for Baidu Coding alone.

  2. Configure Baidu Coding as a single-model custom provider (workaround for Bug 1):

    custom_providers:
      - display_name: Baidu Coding - deepseek-v4-flash
        base_url: https://qianfan.baidubce.com/v2/coding
        api_key: ${BAIDU_CODING_API_KEY}
        models:
          - deepseek-v4-flash
  3. Run hermes chat and select deepseek-v4-flash

  4. Send a long prompt (~130K tokens)

  5. Observe: Hermes truncates context compression at 128K (the "deepseek" catch-all in DEFAULT_CONTEXT_LENGTHS) instead of using the real 1M window

  6. Alternatively, select glm-5.1 and observe: Hermes over-sends to 202,752 tokens → Baidu silently truncates at 198K → incomplete output → retry loop

Code Example

custom_providers:
     - display_name: Baidu Coding
       base_url: https://qianfan.baidubce.com/v2/coding
       api_key: ${BAIDU_CODING_API_KEY}
       models:
         - glm-5.1
         - deepseek-v4-flash
         - kimi-k2.5
         - minimax-m2.5
         - ernie-4.5-turbo
         - qwen3-coder-plus
         - qwen3-235b-a22b

---

custom_providers:
     - display_name: Baidu Coding - deepseek-v4-flash
       base_url: https://qianfan.baidubce.com/v2/coding
       api_key: ${BAIDU_CODING_API_KEY}
       models:
         - deepseek-v4-flash

---

Not applicable — `hermes debug share` captures crashes, tracebacks, and environment details. This bug is a deterministic logic error in the model resolution code path (wrong values in `DEFAULT_CONTEXT_LENGTHS` fuzzy matching + picker flattening). No crash occurs; the symptoms are silent (wrong `context_length=` logged, missing picker entries).

The Root Cause Analysis section below provides the equivalent diagnostic: exact file paths, line numbers, and the resolution chain steps involved.

If a debug share is required to validate the fix, I can provide one after the native provider PR is submitted.

---

No traceback — this is a silent logic error, not a crash. The symptoms are:
- Missing models in the `/model` picker (no error logged)
- Wrong context lengths applied (logged as `context_length=128000` for `deepseek-v4-flash` instead of `1000000`)
- Truncation loops manifest as repeated "continuing generation" messages with identical or degraded output

---

glm-5.1198,000     (not 202,752)
   deepseek-v4-flash → 1,000,000  (not 128,000)
   kimi-k2.5256,000   (not 262,144)
   minimax-m2.5192,000 (not 204,800)
   ernie-4.5-turbo → 128,000 (not 256K fallback)
RAW_BUFFERClick to expand / collapse

Bug Description

Baidu Qianfan Coding Plan provides an OpenAI-compatible endpoint with 7 curated models, launched February 2026 and explicitly designed for Claude Code, Cursor, and similar tools.

Hermes has no native provider for Baidu Coding Plan, forcing users into custom_providers — which breaks in two independent ways:

  1. Multi-model picker broken — Defining multiple models: under a single custom provider entry breaks the /model picker; it only surfaces one model (#20582). The workaround (one list entry per model) means 7 duplicate base_url + api_key blocks for Baidu Coding alone.

  2. Wrong context lengths → output truncation → token waste loop — As a custom provider, context length resolution hits DEFAULT_CONTEXT_LENGTHS fuzzy matching (#12977). For Baidu Coding models, the catch-alls are wrong:

ModelActual (Baidu)Catch-allDelta
glm-5.1198,000202,752 ("glm")+4,752
deepseek-v4-flash1,000,000128,000 ("deepseek")-872K
kimi-k2.5256,000262,144 ("kimi")+6,144
minimax-m2.5192,000204,800 ("minimax")+12,800
ernie-4.5-turbo128,000no match → 256K fallback+128K

Overstated values cause Hermes to send prompts exceeding the real context window. Baidu's API silently truncates mid-generation, producing incomplete outputs. The agent detects the truncation and restarts generation in the same session — burning tokens in a loop until the context limit is hit.

The understated deepseek-v4-flash value (128K vs real 1M) wastes 87% of the available window and triggers unnecessary trajectory compression.

Steps to Reproduce

Prerequisite: A Baidu Coding Plan API key (available at https://qianfan.baidubce.com).

Bug 1: Multi-model picker broken

  1. Add the following to ~/.hermes/config.yaml:
    custom_providers:
      - display_name: Baidu Coding
        base_url: https://qianfan.baidubce.com/v2/coding
        api_key: ${BAIDU_CODING_API_KEY}
        models:
          - glm-5.1
          - deepseek-v4-flash
          - kimi-k2.5
          - minimax-m2.5
          - ernie-4.5-turbo
          - qwen3-coder-plus
          - qwen3-235b-a22b
  2. Run hermes chat
  3. Type /model to open the model picker
  4. Only one model from the list appears — the other 6 are invisible

Bug 2: Wrong context lengths

  1. Configure Baidu Coding as a single-model custom provider (workaround for Bug 1):
    custom_providers:
      - display_name: Baidu Coding - deepseek-v4-flash
        base_url: https://qianfan.baidubce.com/v2/coding
        api_key: ${BAIDU_CODING_API_KEY}
        models:
          - deepseek-v4-flash
  2. Run hermes chat and select deepseek-v4-flash
  3. Send a long prompt (~130K tokens)
  4. Observe: Hermes truncates context compression at 128K (the "deepseek" catch-all in DEFAULT_CONTEXT_LENGTHS) instead of using the real 1M window
  5. Alternatively, select glm-5.1 and observe: Hermes over-sends to 202,752 tokens → Baidu silently truncates at 198K → incomplete output → retry loop

Expected Behavior

  • The /model picker should display all 7 models defined under a single custom provider entry.
  • Context lengths should match the provider's actual limits — not generic DEFAULT_CONTEXT_LENGTHS fuzzy matches that were designed for other endpoints (e.g., "deepseek" → 128K was set for older DeepSeek V2/V3 via non-Baidu endpoints, not the Coding Plan's 1M-window V4 Flash).

Actual Behavior

  • Picker: Only one model from a multi-model custom_providers entry is shown. The rest are silently dropped. (Tracked in #20582.)
  • Context lengths: All 5 Baidu Coding models hit wrong fuzzy-match values in DEFAULT_CONTEXT_LENGTHS (step 8 of the resolution chain in agent/model_metadata.py). The most severe case is deepseek-v4-flash receiving 128K instead of 1M, wasting 87% of the context window. The second-most severe is ernie-4.5-turbo with no match at all, falling through to the 256K default — double the real 128K, causing truncation loops.

Affected Component

CLI (interactive chat), Configuration (config.yaml, .env, hermes setup), Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Not applicable — `hermes debug share` captures crashes, tracebacks, and environment details. This bug is a deterministic logic error in the model resolution code path (wrong values in `DEFAULT_CONTEXT_LENGTHS` fuzzy matching + picker flattening). No crash occurs; the symptoms are silent (wrong `context_length=` logged, missing picker entries).

The Root Cause Analysis section below provides the equivalent diagnostic: exact file paths, line numbers, and the resolution chain steps involved.

If a debug share is required to validate the fix, I can provide one after the native provider PR is submitted.

Operating System

Ubuntu 24.04 (reproduced in development environment; bug is OS-independent — it's a logic error in model resolution and picker code)

Python Version

3.11

Hermes Version

Hermes Agent v0.13.0 (2026.5.7)

Additional Logs / Traceback (optional)

No traceback — this is a silent logic error, not a crash. The symptoms are:
- Missing models in the `/model` picker (no error logged)
- Wrong context lengths applied (logged as `context_length=128000` for `deepseek-v4-flash` instead of `1000000`)
- Truncation loops manifest as repeated "continuing generation" messages with identical or degraded output

Root Cause Analysis (optional)

Two independent root causes:

Bug 1: Multi-model picker

In hermes_cli/models.py, the /model picker iterates _PROVIDER_MODELS[provider] for native providers. Custom providers are handled separately — they flatten multi-model entries into a single display row. When a custom provider has multiple models:, only the first model is surfaced in the picker. The custom:name:model triple syntax (line ~1486–1494) supports named custom providers but the picker UI still shows one row per custom_providers entry rather than one row per model.

This is the same root cause as #20582.

Bug 2: Wrong context lengths

In agent/model_metadata.py, the resolve_context_length() function has a 10-step resolution chain (steps 0–10). Custom providers have no provider-specific step, so their models fall through to:

  • Step 8DEFAULT_CONTEXT_LENGTHS fuzzy matching (substring match, longest key first):

    • deepseek-v4-flash matches "deepseek" → 128,000 (set as a legacy fallback for older DeepSeek models, not the 1M-window V4 Flash on Baidu)
    • glm-5.1 matches "glm" → 202,752 (Z.AI's actual value; Baidu Qianfan caps it at 198,000)
    • kimi-k2.5 matches "kimi" → 262,144 (Moonshot's value; Baidu's Coding Plan variant is 256,000)
    • minimax-m2.5 matches "minimax" → 204,800 (MiniMax's own API value; Baidu's variant is 192,000)
    • ernie-4.5-turbo has no match in DEFAULT_CONTEXT_LENGTHS
  • Step 10 — default fallback of 256K (for ernie-4.5-turbo with no fuzzy match)

The core issue: DEFAULT_CONTEXT_LENGTHS values are sourced from each model creator's own API (e.g., "deepseek" → 128K from DeepSeek V2/V3 docs), but the same model IDs served through Baidu Coding Plan have different context windows. The fuzzy-match table cannot distinguish between deepseek-v4-flash on api.deepseek.com (1M) vs. Baidu Coding Plan (also 1M, but caught by the legacy "deepseek" → 128K catch-all before the specific "deepseek-v4-flash" → 1M entry).

Native providers solve this via provider-specific steps in the resolution chain (e.g., step 1b for Bedrock in agent/bedrock_adapter.py). Custom providers have no such step.

Proposed Fix (optional)

Add Baidu Coding Plan as a native provider (baidu-coding) that bypasses both bugs:

  1. Provider plugin at plugins/model-providers/baidu-coding/ — registers the provider, declares 7 curated Coding Plan models with correct context lengths
  2. Provider-scoped context length table — step 1c in the resolution chain (following the Bedrock step 1b pattern in agent/bedrock_adapter.py), with a static table in agent/baidu_coding_context.py:
    glm-5.1 → 198,000     (not 202,752)
    deepseek-v4-flash → 1,000,000  (not 128,000)
    kimi-k2.5 → 256,000   (not 262,144)
    minimax-m2.5 → 192,000 (not 204,800)
    ernie-4.5-turbo → 128,000 (not 256K fallback)
  3. Env varsBAIDU_CODING_API_KEY (primary) / BAIDU_API_KEY (fallback) + BAIDU_CODING_BASE_URL (overridable)
  4. 7 curated model entries in _PROVIDER_MODELS["baidu-coding"] — Coding Plan models only, not the full Qianfan catalog

This is the same approach used by all bundled providers: native registration gives correct picker display + correct context resolution + proper /doctor env checks.

I have a working prototype of the baidu-coding provider — plugin registration, context length table, 7 curated models, and tests. Currently validating against the repo's test conventions before submitting as a PR.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: custom_providers unstable with Baidu Coding Plan — multi-model picker broken + wrong context lengths causing truncation [1 comments, 2 participants]