hermes - 💡(How to fix) Fix [Feature] computer_use: route screenshots through auxiliary.vision when main model lacks vision

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

computer_use captures screenshot → returns as _multimodal tool result → fed back to main model (MiniMax/M2) → _model_supports_vision() returns False → error: "computer_use returned screenshot/image content, but the active model/provider does not support image input"

Fix Action

Fix / Workaround

Proposed fix: Patch _tool_result_content_for_active_model() (or add a routing check in run_agent.py) so that when:

  • Tool name is computer_use
  • Result has image content (_content_has_image_parts() returns True)
  • Main model does NOT support vision (_model_supports_vision() returns False)
  • auxiliary.vision is configured

Alternative workaround for users: Use browser_vision instead, which correctly routes through auxiliary.vision. Or manually use computer_use capture + send base64 to OpenRouter VL model separately.

Code Example

computer_use captures screenshot → returns as _multimodal tool result →
fed back to main model (MiniMax/M2)_model_supports_vision() returns Falseerror: \"computer_use returned screenshot/image content, but the active model/provider does not support image input\"
RAW_BUFFERClick to expand / collapse

Problem: The computer_use tool captures screenshots correctly but cannot describe their visual content when the main model (e.g., MiniMax/M2) lacks vision capability. Screenshots are returned as base64 images in tool results, but _tool_result_content_for_active_model() in run_agent.py:3327 checks _model_supports_vision() on the main model only — it does not route to auxiliary.vision.

Current flow:

computer_use captures screenshot → returns as _multimodal tool result →
fed back to main model (MiniMax/M2) → _model_supports_vision() returns False →
error: \"computer_use returned screenshot/image content, but the active model/provider does not support image input\"

auxiliary.vision only applies to the vision_analyze tool, not computer_use. The computer_use tool results are always processed by the main model, regardless of auxiliary.vision config.

Reproduction:

  1. Set model.default = MiniMax/M2, model.provider = minimax
  2. Configure auxiliary.vision.provider = openrouter, auxiliary.vision.model = nvidia/nemotron-nano-12b-v2-vl:free
  3. Use computer_use with action=capture — screenshot captured successfully
  4. Error returned: main model does not support image input

Proposed fix: Patch _tool_result_content_for_active_model() (or add a routing check in run_agent.py) so that when:

  • Tool name is computer_use
  • Result has image content (_content_has_image_parts() returns True)
  • Main model does NOT support vision (_model_supports_vision() returns False)
  • auxiliary.vision is configured

Then route the screenshot base64 through resolve_vision_provider_client() instead of returning an error.

Alternative workaround for users: Use browser_vision instead, which correctly routes through auxiliary.vision. Or manually use computer_use capture + send base64 to OpenRouter VL model separately.

Affected area: run_agent.py — tool result handling for multimodal results from non-vision main models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING