hermes - ✅(Solved) Fix [Bug]: custom_providers.models.context_length not propagated to auxiliary compression feasibility check [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

In run_agent.py, line ~2080-2085, the feasibility check calls:

aux_context = get_model_context_length(
    aux_model,
    base_url=aux_base_url,
    api_key=aux_api_key,
    config_context_length=getattr(self, "_aux_compression_context_length_config", None),
)

This only passes _aux_compression_context_length_config (from auxiliary.compression.context_length in config), but does NOT resolve custom_providers.models context_length for the auxiliary model.

Meanwhile, the main model (lines 1499-1536) correctly resolves custom_providers.models context_length and stores it in _config_context_length. The auxiliary path skips this resolution entirely.

When the compression model is the same as the main model (default behavior), get_model_context_length receives config_context_length=None and falls back to built-in defaults (128K for glm-5.1).

PR fix notes

PR #13540: fix(agent): propagate custom provider context_length to compression feasibility check

Description (problem / solution / changelog)

What changed and why

When custom_providers sets a context_length for the main model (e.g. 200K), the compression feasibility check didn't propagate this to the auxiliary model's get_model_context_length() call when the aux model falls back to the main model. This caused a false-positive "compression model too small" warning and unnecessary threshold auto-lowering.

Example from the issue: custom_providers sets context_length=200000, threshold=0.65 (130K). Without the fix, aux query returns the built-in default 128K, which is below 130K — triggering a spurious warning.

Three-part fix in run_agent.py:

  1. Persist _config_context_length after custom_providers resolution in __init__ (was only set before the providers loop)
  2. Propagate to aux model query when aux_model == self.model and base URLs match (the fallback scenario)
  3. Re-derive threshold from get_model_context_length() with the correct config override, instead of reading the potentially-stale threshold_tokens from the compressor

How to test it

pytest tests/run_agent/test_compression_feasibility.py -v

19 tests including 3 new regression tests for the exact false-positive scenario from the issue: custom provider propagation when aux matches main, different aux model does NOT get main's override, and the end-to-end false positive elimination.

Platform tested on

macOS 15 (Apple Silicon)

Closes #12977

Changed files

  • run_agent.py (modified, +37/-3)
  • tests/run_agent/test_compression_feasibility.py (modified, +235/-25)

Code Example

Compression model (glm-5.1) context is 128,000 tokens, but the main model's compression threshold was 130,000 tokens. Auto-lowered this session's threshold to 128,000 tokens so compression can run.

---

model:
  default: glm-5.1
  provider: custom
  base_url: http://localhost:8317/v1
  api_key: sk-xxx

custom_providers:
  - name: Local (localhost:8317)
    base_url: http://localhost:8317/v1
    api_key: sk-xxx
    model: glm-5.1
    models:
      glm-5.1:
        context_length: 200000

---

aux_context = get_model_context_length(
    aux_model,
    base_url=aux_base_url,
    api_key=aux_api_key,
    config_context_length=getattr(self, "_aux_compression_context_length_config", None),
)
RAW_BUFFERClick to expand / collapse

Bug Description

When context_length is set in custom_providers.models, it correctly applies to the main model context window, but the auxiliary compression feasibility check (_check_compression_model_feasibility) does NOT resolve it for the compression model when the compression model falls back to the main model.

This produces an incorrect warning about context mismatch, and auto-lowers the compression threshold unnecessarily:

⚠ Compression model (glm-5.1) context is 128,000 tokens, but the main model's compression threshold was 130,000 tokens. Auto-lowered this session's threshold to 128,000 tokens so compression can run.

Even though the user has configured context_length: 200000 for the model.

Steps to Reproduce

  1. Configure custom_providers with a model that has a context_length override different from the built-in default:
model:
  default: glm-5.1
  provider: custom
  base_url: http://localhost:8317/v1
  api_key: sk-xxx

custom_providers:
  - name: Local (localhost:8317)
    base_url: http://localhost:8317/v1
    api_key: sk-xxx
    model: glm-5.1
    models:
      glm-5.1:
        context_length: 200000
  1. Do NOT set auxiliary.compression.model or auxiliary.compression.context_length (so compression falls back to the main model).
  2. Set compression.threshold: 0.65 (default).
  3. Start Hermes.
  4. Observe the warning on startup — 200K context should be recognized but 128K is reported instead.

Expected Behavior

When the compression model matches a model defined in custom_providers.models, the feasibility check should resolve the context_length from custom_providers the same way the main model does (lines 1499-1536). No warning should appear since 0.65 × 200,000 = 130,000 < 200,000.

Actual Behavior

The auxiliary feasibility check calls get_model_context_length() with config_context_length=None, falling back to the built-in default (128K for glm-5.1), ignoring the user's custom_providers.models.glm-5.1.context_length: 200000. This triggers a false warning and auto-lowers the compression threshold.

Affected Component

  • Agent Core (conversation loop, context compression, memory)
  • Configuration (config.yaml, .env, hermes setup)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Running on self-hosted VPS with CLIProxyAPI as local LLM gateway. Issue reproduced on CLI and Discord gateway.

Operating System

Ubuntu 24.04 VPS

Python Version

3.11

Hermes Version

v0.1.0 (editable install from NousResearch/hermes-agent main branch)

Root Cause Analysis

In run_agent.py, line ~2080-2085, the feasibility check calls:

aux_context = get_model_context_length(
    aux_model,
    base_url=aux_base_url,
    api_key=aux_api_key,
    config_context_length=getattr(self, "_aux_compression_context_length_config", None),
)

This only passes _aux_compression_context_length_config (from auxiliary.compression.context_length in config), but does NOT resolve custom_providers.models context_length for the auxiliary model.

Meanwhile, the main model (lines 1499-1536) correctly resolves custom_providers.models context_length and stores it in _config_context_length. The auxiliary path skips this resolution entirely.

When the compression model is the same as the main model (default behavior), get_model_context_length receives config_context_length=None and falls back to built-in defaults (128K for glm-5.1).

Proposed Fix

In _check_compression_model_feasibility, before calling get_model_context_length, resolve the custom_providers.models context_length for aux_model (mirroring the logic at lines 1499-1536) and pass it as config_context_length.

Alternatively, extract the custom_providers.models context resolution into a reusable function that both the main model and auxiliary paths call.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

The issue can be fixed by modifying the _check_compression_model_feasibility function to resolve the custom_providers.models context length for the auxiliary model.

Guidance

  1. Identify the root cause: The auxiliary feasibility check does not resolve the context_length from custom_providers for the compression model, causing it to fall back to the built-in default.
  2. Modify the _check_compression_model_feasibility function: Before calling get_model_context_length, resolve the custom_providers.models context length for aux_model and pass it as config_context_length.
  3. Extract the context resolution into a reusable function: Consider creating a separate function to resolve the custom_providers.models context length, which can be called by both the main model and auxiliary paths.
  4. Verify the fix: After applying the changes, restart Hermes and check if the warning about context mismatch is resolved and the compression threshold is set correctly.

Example

def resolve_custom_provider_context_length(model_name, base_url, api_key):
    # Logic to resolve custom provider context length
    # ...

def _check_compression_model_feasibility(self):
    # ...
    aux_context_length = resolve_custom_provider_context_length(aux_model, aux_base_url, aux_api_key)
    aux_context = get_model_context_length(
        aux_model,
        base_url=aux_base_url,
        api_key=aux_api_key,
        config_context_length=aux_context_length,
    )
    # ...

Notes

The proposed fix assumes that the custom_providers.models context length resolution logic is similar to the one used for the main model. If the logic is different, the fix may need to be adjusted accordingly.

Recommendation

Apply the workaround by modifying the _check_compression_model_feasibility function to resolve the custom_providers.models context length for the auxiliary model. This should fix the issue and prevent the incorrect warning and auto-lowering of the compression threshold.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: custom_providers.models.context_length not propagated to auxiliary compression feasibility check [1 pull requests]