hermes - 💡(How to fix) Fix fix(run_agent): non-vision models get JSON error instead of text summary for computer_use captures [1 pull requests]

hermes2026-05-31 10:48:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

When a non-vision model (DeepSeek, Qwen, etc.) calls computer_use with action='capture', the method _tool_result_content_for_active_model in run_agent.py returns a JSON object with an error field instead of silently falling back to the text summary (AX tree). "error": "computer_use returned screenshot/image content, but the active model/provider does not support image input. Switch to a vision-capable model...", The model sees the error message and gets confused — it complains about not being vision-capable instead of using the text summary it already received. Non-vision models should receive the text summary (AX tree) silently, without the error block. The logger.warning that follows is sufficient for debugging. 3. Model receives JSON with error field and complains Removed the if tool_name == "computer_use": return json.dumps({error: ...}) block from _tool_result_content_for_active_model (~line 3892 in run_agent.py). The remaining code path returns the text summary directly.

Fix Action

Fixed

Fixed by PR: fix(run_agent): return text summary directly for non-vision computer_use captures (fixes #35817) (https://github.com/NousResearch/hermes-agent/pull/35837)

Code Example

{
  "error": "computer_use returned screenshot/image content, but the active model/provider does not support image input. Switch to a vision-capable model...",
  "text_summary": "..."
}

RAW_BUFFERClick to expand / collapse

Description: When a non-vision model (DeepSeek, Qwen, etc.) calls computer_use with action='capture', the method _tool_result_content_for_active_model in run_agent.py returns a JSON object with an error field instead of silently falling back to the text summary (AX tree).

{
  "error": "computer_use returned screenshot/image content, but the active model/provider does not support image input. Switch to a vision-capable model...",
  "text_summary": "..."
}

The model sees the error message and gets confused — it complains about not being vision-capable instead of using the text summary it already received.

Expected behaviour: Non-vision models should receive the text summary (AX tree) silently, without the error block. The logger.warning that follows is sufficient for debugging.

Reproduction:

Use a non-vision model (e.g. deepseek-v4-flash)
Call computer_use(action='capture')
Model receives JSON with error field and complains

Local fix: Removed the if tool_name == "computer_use": return json.dumps({error: ...}) block from _tool_result_content_for_active_model (~line 3892 in run_agent.py). The remaining code path returns the text summary directly.

v0.15.1 status: Still reproducible. The block was removed in a previous version but has been re-introduced.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering