hermes - 💡(How to fix) Fix Pre-resize inbound images in gateway before they enter context window

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

The existing auto-resize logic in tools/vision_tools.py (_resize_image_for_vision) only triggers reactively after an API call fails with a size error. By that point, the oversized image has already been encoded into the conversation context.

Fix Action

Fix / Workaround

Local Patch

I have a working local patch (adds _downscale_image_bytes() helper called from cache_image_from_bytes()) that I'm happy to submit as a PR if this approach looks right.

RAW_BUFFERClick to expand / collapse

Problem

When users send screenshots (especially retina/HiDPI) via Telegram or other gateway platforms, the full-resolution image bytes are cached as-is and then base64-encoded into the context window. A typical CleanShot @2x screenshot is 3-5MB and 3000+ pixels wide — way more resolution than any vision model needs, and it burns a huge number of tokens.

The existing auto-resize logic in tools/vision_tools.py (_resize_image_for_vision) only triggers reactively after an API call fails with a size error. By that point, the oversized image has already been encoded into the conversation context.

Proposed Solution

Downscale images proactively in cache_image_from_bytes() in gateway/platforms/base.py before they're saved to the cache. This is the single funnel point where all platform adapters (Telegram, Discord, Signal, Matrix, etc.) pass through, so one change covers all platforms.

Suggested behavior:

  • If either dimension exceeds a configurable threshold (default: 1600px), downscale proportionally using Lanczos resampling
  • JPEG output (quality 85) for opaque images, PNG for images with alpha
  • Pillow as a soft dependency — if not installed, pass through unchanged
  • Add a config key like gateway.max_image_dimension: 1600 so users can tune it

Impact

  • Significantly reduces token consumption for image-heavy workflows (recipe screenshots, UI reviews, photo logging)
  • Eliminates the reactive resize-on-failure path for most cases
  • No loss of useful information — vision models don't benefit from >1600px resolution

Local Patch

I have a working local patch (adds _downscale_image_bytes() helper called from cache_image_from_bytes()) that I'm happy to submit as a PR if this approach looks right.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING