hermes - 💡(How to fix) Fix feat(vision): add agent.max_vision_images_in_context config to limit base64 image accumulation [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#28446Fetched 2026-05-20 04:03:42
View on GitHub
Comments
1
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
labeled ×4mentioned ×1subscribed ×1

Root Cause

The native vision fast path stores images as {"type": "image_url", "image_url": {"url": "data:image/..."}} in the tool role message. These blobs remain in the context history on every subsequent turn. The context compressor (_prune_old_tool_results) only strips them when compression.enabled: true and the context exceeds the compression threshold.

Fix Action

Fix / Workaround

Workarounds (current)

RAW_BUFFERClick to expand / collapse

Problem

When using vision_analyze (native vision fast path) in a multi-image session, base64 image payloads accumulate in the conversation context indefinitely. Each image can be 1–10 MB of base64, and 7+ images in a single session easily reaches 14M chars (~4M tokens) — consuming the entire context window.

Root Cause

The native vision fast path stores images as {"type": "image_url", "image_url": {"url": "data:image/..."}} in the tool role message. These blobs remain in the context history on every subsequent turn. The context compressor (_prune_old_tool_results) only strips them when compression.enabled: true and the context exceeds the compression threshold.

Observed session

  • 7 screenshots in one session: 14.5M chars / ~4M tokens
  • Image ratio: ~100% of context
  • compression.enabled: false was the default → zero pruning

Proposed Solution

Add a config knob: agent.max_vision_images_in_context: N (default 3–5).

When the agent loop appends a new native vision tool result, proactively replace older image payloads beyond the N most recent with [screenshot removed — kept N most recent to save context]. This should happen before the API call, not after context overflow.

Implementation hint: after appending a new _multimodal tool result to messages, scan backward for image_url type parts and strip those beyond position N, similar to _strip_image_parts_from_parts.

Workarounds (current)

  1. compression.enabled: true + threshold: 0.30 — strips old images at 30% context usage
  2. agent.image_input_mode: text — converts images to text descriptions upfront (quality tradeoff)

Impact

  • Sessions with repeated screenshots (UI QA, slide review) hit context limit silently
  • No user-visible warning until API returns 400/413

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING