hermes - 💡(How to fix) Fix Bug: auxiliary runtime context-length tracking is still fragmented across compression/web_extract paths [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

Code Example

python -m pytest tests/agent/test_context_compressor.py \
  tests/tools/test_web_tools_config.py \
  tests/run_agent/test_compression_feasibility.py \
  -q -o 'addopts='
RAW_BUFFERClick to expand / collapse

Bug Description

Hermes still treats auxiliary-model context-length tracking as a set of partially separate code paths instead of one shared auxiliary-runtime metadata flow.

I hit this locally while debugging auxiliary-model context reporting for compression/web_extract and ended up with a broader fix than the narrow auxiliary.compression.context_length bug that was already addressed in earlier issues/PRs.

The key problem is that the main-model path and the auxiliary-model path do not consistently share the same "resolved runtime -> detected context length -> exposed metadata" lifecycle.

In practice this can produce inconsistencies such as:

  • the main model showing the correct context window,
  • the auxiliary runtime actually using the intended provider/model,
  • but compression/web_extract metadata and diagnostics not consistently reflecting the final live auxiliary client/model/context window,
  • especially after retry/auth-refresh/payment-fallback rerouting.

This is broader than the already-known narrow bug where _check_compression_model_feasibility() ignored explicit auxiliary.compression.context_length / custom-provider overrides.

Observed Behavior

Locally, the repair that made the behavior consistent required all of the following:

  1. Shared auxiliary metadata storage in agent/auxiliary_client.py

    • _store_auxiliary_task_metadata(...)
    • get_auxiliary_task_metadata(task)
    • detect_auxiliary_context_length(...)
    • _track_auxiliary_task_metadata(...)
    • resolve_auxiliary_context_length(...)
  2. Tracking metadata at all common auxiliary entry points

    • get_text_auxiliary_client(...)
    • get_async_text_auxiliary_client(...)
    • call_llm(...)
    • async_call_llm(...)
  3. Updating metadata after rerouting branches

    • auth-refresh retry
    • payment fallback
    • retry client/model substitution
  4. Feeding the resolved auxiliary compression runtime back into ContextCompressor

    • ContextCompressor.update_summary_model(model, context_length, base_url, api_key)
    • storing summary_context_length, summary_base_url, summary_api_key
  5. Making web_extract use the same generic auxiliary metadata store

    • tools.web_tools.get_web_extract_auxiliary_metadata() delegates to get_auxiliary_task_metadata("web_extract")
    • tools.web_tools._resolve_web_extract_auxiliary(...) stores the detected context length for the final resolved runtime

Without that shared tracking, it is easy for one path to know the right model while another path reports stale/incomplete/no context metadata.

Why This Still Matters Even After Related Fixes

There are already related issues/PRs around compression runtime context overrides, for example the narrow feasibility-check bug and explicit auxiliary.compression.context_length support.

Those fix only part of the problem.

The broader issue is that auxiliary runtime metadata is not treated as a first-class shared runtime state the same way the main model path effectively is.

That means the following can drift apart:

  • resolved auxiliary client
  • resolved auxiliary model
  • detected auxiliary context length
  • metadata exposed to diagnostics / helper accessors
  • actual final runtime after retries/fallbacks

Proposed Solution

Promote auxiliary runtime metadata tracking to a shared mechanism in agent/auxiliary_client.py and make all auxiliary consumers use it.

Proposed shape:

  1. In agent/auxiliary_client.py

    • keep/introduce a task-generic metadata store keyed by auxiliary task name
    • provide shared helpers:
      • _store_auxiliary_task_metadata(...)
      • get_auxiliary_task_metadata(task)
      • detect_auxiliary_context_length(...)
      • _track_auxiliary_task_metadata(...)
      • resolve_auxiliary_context_length(...) -> (client, model, context_length)
  2. Ensure all common auxiliary entry points call the shared tracker

    • sync getter path
    • async getter path
    • sync LLM path
    • async LLM path
    • retry/auth-refresh/fallback branches must update stored metadata for the final actual runtime used
  3. In run_agent.py::_check_compression_model_feasibility()

    • stop independently rebuilding context detection logic
    • call resolve_auxiliary_context_length("compression", ...)
    • persist the resolved runtime onto self.context_compressor
  4. In agent/context_compressor.py

    • add a tiny setter like:
      • update_summary_model(model, context_length, base_url, api_key)
    • track the resolved auxiliary summary runtime explicitly
  5. In tools/web_tools.py

    • make get_web_extract_auxiliary_metadata() read from the shared generic task store instead of a separate web-only cache/path

Why This Design Helps

  • one source of truth for auxiliary runtime metadata
  • compression and web_extract stop diverging
  • diagnostics can report the final live runtime actually used
  • retries/fallbacks no longer leave stale metadata behind
  • future auxiliary tasks (session_search, skills_hub, approval, mcp, flush_memories, title_generation, etc.) can adopt the same pattern instead of each inventing a separate metadata path

Minimal Repro Category

This is easiest to see when:

  • the auxiliary runtime is different from the main runtime,
  • context length must be inferred from provider metadata / override logic,
  • and the auxiliary path can reroute during execution (retry, auth refresh, or payment fallback).

Local Validation

I validated the repair locally with focused tests covering compression/web_extract metadata tracking:

python -m pytest tests/agent/test_context_compressor.py \
  tests/tools/test_web_tools_config.py \
  tests/run_agent/test_compression_feasibility.py \
  -q -o 'addopts='

Result:

  • 135 passed

Suggested Regression Coverage

Please add/keep tests for:

  • sync auxiliary getter metadata tracking
  • async auxiliary getter metadata tracking
  • sync call_llm(...) metadata tracking
  • async async_call_llm(...) metadata tracking
  • compression feasibility path using shared resolution helper
  • ContextCompressor summary runtime metadata update
  • web_extract metadata accessor returning the same shared task metadata
  • retry/auth-refresh/payment-fallback updating metadata to the final runtime actually used

Environment

  • Hermes Agent: v0.12.0 local source install
  • Repo: NousResearch/hermes-agent
  • Local validation done against a dirty worktree with focused tests only

Related Context

This seems related to, but broader than:

  • prior compression feasibility / custom-provider context-length propagation issues
  • explicit auxiliary.compression.context_length support

The broad ask here is: make auxiliary runtime metadata tracking a shared first-class path, not a set of partially duplicated special cases.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING