hermes - 💡(How to fix) Fix Bug: auxiliary runtime context-length tracking is still fragmented across compression/web

StepCodex · 2026-05-07T05:54:05Z

[hermes] Bug Description Hermes still treats auxiliary-model context-length tracking as a set of partially separate code paths instead of one shared auxiliary-… ## Fixed - Fixed by PR: fix(web): preserve compact metadata in web_extract output (https://github.com/NousResearch/hermes-agent/pull/21075) - Fixed by PR: fix(auxiliary): track final runtime metadata across retries (https://github.com/NousResearch/hermes-agent/pull/21095) ## Bug Description Hermes still treats auxiliary-model context-length tracking as a set of partially separate code paths instead of one shared auxiliary-runtime metadata flow. I hit this locally while debugging auxiliary-model context reporting for compression/web_extract and ended up with a broader fix than the narrow `auxiliary.compression.context_length` bug that was already addressed in earlier issues/PRs. The key problem is that the main-model path and the auxiliary-model path do not consistently share the same "resolved runtime -> detected context length -> exposed metadata" lifecycle. In practice this can produce inconsistencies such as: - the main model showing the correct context window, - the auxiliary runtime actually using the intended provider/model, - but compression/web_extract metadata and diagnostics not consistently reflecting the final live auxiliary client/model/context window, - especially after retry/auth-refresh/payment-fallback rerouting. This is broader than the already-known narrow bug where `_check_compression_model_feasibility()` ignored explicit `auxiliary.compression.context_length` / custom-provider overrides. ## Observed Behavior Locally, the repair that made the behavior consistent required all of the following: 1. Shared auxiliary metadata storage in `agent/auxiliary_client.py` - `_store_auxiliary_task_metadata(...)` - `get_auxiliary_task_metadata(task)` - `detect_auxiliary_context_length(...)` - `_track_auxiliary_task_metadata(...)` - `resolve_auxiliary_context_length(...)` 2. Tracking metadata at all common auxiliary entry points - `get_text_auxiliary_client(...)` - `get_async_text_auxiliary_client(...)` - `call_llm(...)` - `async_call_llm(...)` 3. Updating metadata after rerouting branches - auth-refresh retry - payment fallback - retry client/model substitution 4. Feeding the resolved auxiliary compression runtime back into `ContextCompressor` - `ContextCompressor.update_summary_model(model, context_length, base_url, api_key)` - storing `summary_context_length`, `summary_base_url`, `summary_api_key` 5. Making web_extract use the same generic auxiliary metadata store - `tools.web_tools.get_web_extract_auxiliary_metadata()` delegates to `get_auxiliary_task_metadata("web_extract")` - `tools.web_tools._resolve_web_extract_auxiliary(...)` stores the detected context length for the final resolved runtime Without that shared tracking, it is easy for one path to know the right model while another path reports stale/incomplete/no context metadata. ## Why This Still Matters Even After Related Fixes There are already related issues/PRs around compression runtime context overrides, for example the narrow feasibility-check bug and explicit `auxiliary.compression.context_length` support. Those fix only part of the problem. The broader issue is that auxiliary runtime metadata is not treated as a first-class shared runtime state the same way the main model path effectively is. That means the following can drift apart: - resolved auxiliary client - resolved auxiliary model - detected auxiliary context length - metadata exposed to diagnostics / helper accessors - actual final runtime after retries/fallbacks ## Proposed Solution Promote auxiliary runtime metadata tracking to a shared mechanism in `agent/auxiliary_client.py` and make all auxiliary consumers use it. Proposed shape: 1. In `agent/auxiliary_client.py` - keep/introduce a task-generic metadata store keyed by auxiliary task name - provide shared helpers: - `_store_auxiliary_task_metadata(...)` - `get_auxiliary_task_metadata(task)` - `detect_auxiliary_context_length(...)` - `_track_auxiliary_task_metadata(...)` - `resolve_auxiliary_context_length(...) -> (client, model, context_length)` 2. Ensure all common auxiliary entry points call the shared tracker - sync getter path - async getter path - sync LLM path - async LLM path - retry/auth-refresh/fallback branches must update stored metadata for the final actual runtime used 3. In `run_agent.py::_check_compression_model_feasibility()` - stop independently rebuilding context detection logic - call `resolve_auxiliary_context_length("compression", ...)` - persist the resolved runtime onto `self.context_compressor` 4. In `agent/context_compressor.py` - add a tiny setter like: - `update_summary_model(model, context_length, base_url, api_key)` - track the resolved auxiliary summary runtime explicitly 5. In `tools/web_tools.py` - make `get_web_extract_auxiliary_metadata()` read from the shared generic task store instead of a separate web-only cache/path ## Why Thi

Bug Description

Hermes still treats auxiliary-model context-length tracking as a set of partially separate code paths instead of one shared auxiliary-runtime metadata flow.

I hit this locally while debugging auxiliary-model context reporting for compression/web_extract and ended up with a broader fix than the narrow auxiliary.compression.context_length bug that was already addressed in earlier issues/PRs.

The key problem is that the main-model path and the auxiliary-model path do not consistently share the same "resolved runtime -> detected context length -> exposed metadata" lifecycle.

In practice this can produce inconsistencies such as:

the main model showing the correct context window,
the auxiliary runtime actually using the intended provider/model,
but compression/web_extract metadata and diagnostics not consistently reflecting the final live auxiliary client/model/context window,
especially after retry/auth-refresh/payment-fallback rerouting.

This is broader than the already-known narrow bug where _check_compression_model_feasibility() ignored explicit auxiliary.compression.context_length / custom-provider overrides.

Observed Behavior

Locally, the repair that made the behavior consistent required all of the following:

Shared auxiliary metadata storage in agent/auxiliary_client.py
- _store_auxiliary_task_metadata(...)
- get_auxiliary_task_metadata(task)
- detect_auxiliary_context_length(...)
- _track_auxiliary_task_metadata(...)
- resolve_auxiliary_context_length(...)
Tracking metadata at all common auxiliary entry points
- get_text_auxiliary_client(...)
- get_async_text_auxiliary_client(...)
- call_llm(...)
- async_call_llm(...)
Updating metadata after rerouting branches
- auth-refresh retry
- payment fallback
- retry client/model substitution
Feeding the resolved auxiliary compression runtime back into ContextCompressor
- ContextCompressor.update_summary_model(model, context_length, base_url, api_key)
- storing summary_context_length, summary_base_url, summary_api_key
Making web_extract use the same generic auxiliary metadata store
- tools.web_tools.get_web_extract_auxiliary_metadata() delegates to get_auxiliary_task_metadata("web_extract")
- tools.web_tools._resolve_web_extract_auxiliary(...) stores the detected context length for the final resolved runtime

Without that shared tracking, it is easy for one path to know the right model while another path reports stale/incomplete/no context metadata.

Why This Still Matters Even After Related Fixes

There are already related issues/PRs around compression runtime context overrides, for example the narrow feasibility-check bug and explicit auxiliary.compression.context_length support.

Those fix only part of the problem.

The broader issue is that auxiliary runtime metadata is not treated as a first-class shared runtime state the same way the main model path effectively is.

That means the following can drift apart:

resolved auxiliary client
resolved auxiliary model
detected auxiliary context length
metadata exposed to diagnostics / helper accessors
actual final runtime after retries/fallbacks

Proposed Solution

Promote auxiliary runtime metadata tracking to a shared mechanism in agent/auxiliary_client.py and make all auxiliary consumers use it.

Proposed shape:

In agent/auxiliary_client.py
- keep/introduce a task-generic metadata store keyed by auxiliary task name
- provide shared helpers:
  - _store_auxiliary_task_metadata(...)
  - get_auxiliary_task_metadata(task)
  - detect_auxiliary_context_length(...)
  - _track_auxiliary_task_metadata(...)
  - resolve_auxiliary_context_length(...) -> (client, model, context_length)
Ensure all common auxiliary entry points call the shared tracker
- sync getter path
- async getter path
- sync LLM path
- async LLM path
- retry/auth-refresh/fallback branches must update stored metadata for the final actual runtime used
In run_agent.py::_check_compression_model_feasibility()
- stop independently rebuilding context detection logic
- call resolve_auxiliary_context_length("compression", ...)
- persist the resolved runtime onto self.context_compressor
In agent/context_compressor.py
- add a tiny setter like:
  - update_summary_model(model, context_length, base_url, api_key)
- track the resolved auxiliary summary runtime explicitly
In tools/web_tools.py
- make get_web_extract_auxiliary_metadata() read from the shared generic task store instead of a separate web-only cache/path

Why This Design Helps

one source of truth for auxiliary runtime metadata
compression and web_extract stop diverging
diagnostics can report the final live runtime actually used
retries/fallbacks no longer leave stale metadata behind
future auxiliary tasks (session_search, skills_hub, approval, mcp, flush_memories, title_generation, etc.) can adopt the same pattern instead of each inventing a separate metadata path

Minimal Repro Category

This is easiest to see when:

the auxiliary runtime is different from the main runtime,
context length must be inferred from provider metadata / override logic,
and the auxiliary path can reroute during execution (retry, auth refresh, or payment fallback).

Local Validation

I validated the repair locally with focused tests covering compression/web_extract metadata tracking:

python -m pytest tests/agent/test_context_compressor.py \
  tests/tools/test_web_tools_config.py \
  tests/run_agent/test_compression_feasibility.py \
  -q -o 'addopts='

Result:

135 passed

Suggested Regression Coverage

Please add/keep tests for:

sync auxiliary getter metadata tracking
async auxiliary getter metadata tracking
sync call_llm(...) metadata tracking
async async_call_llm(...) metadata tracking
compression feasibility path using shared resolution helper
ContextCompressor summary runtime metadata update
web_extract metadata accessor returning the same shared task metadata
retry/auth-refresh/payment-fallback updating metadata to the final runtime actually used

Environment

Hermes Agent: v0.12.0 local source install
Repo: NousResearch/hermes-agent
Local validation done against a dirty worktree with focused tests only

Related Context

This seems related to, but broader than:

prior compression feasibility / custom-provider context-length propagation issues
explicit auxiliary.compression.context_length support

The broad ask here is: make auxiliary runtime metadata tracking a shared first-class path, not a set of partially duplicated special cases.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Bug: auxiliary runtime context-length tracking is still fragmented across compression/web_extract paths [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

Code Example

Bug Description

Observed Behavior

Why This Still Matters Even After Related Fixes

Proposed Solution

Why This Design Helps

Minimal Repro Category

Local Validation

Suggested Regression Coverage

Environment

Related Context

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Bug: auxiliary runtime context-length tracking is still fragmented across compression/web_extract paths [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

Code Example

Bug Description

Observed Behavior

Why This Still Matters Even After Related Fixes

Proposed Solution

Why This Design Helps

Minimal Repro Category

Local Validation

Suggested Regression Coverage

Environment

Related Context

Still need to ship something?

RELATED_DISCOVERY

TRENDING