hermes - 💡(How to fix) Fix fix(run_agent): non-vision models get JSON error instead of text summary for computer_use captures [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

When a non-vision model (DeepSeek, Qwen, etc.) calls computer_use with action='capture', the method _tool_result_content_for_active_model in run_agent.py returns a JSON object with an error field instead of silently falling back to the text summary (AX tree). "error": "computer_use returned screenshot/image content, but the active model/provider does not support image input. Switch to a vision-capable model...", The model sees the error message and gets confused — it complains about not being vision-capable instead of using the text summary it already received. Non-vision models should receive the text summary (AX tree) silently, without the error block. The logger.warning that follows is sufficient for debugging. 3. Model receives JSON with error field and complains Removed the if tool_name == "computer_use": return json.dumps({error: ...}) block from _tool_result_content_for_active_model (~line 3892 in run_agent.py). The remaining code path returns the text summary directly.

Fix Action

Fixed

Code Example

{
  "error": "computer_use returned screenshot/image content, but the active model/provider does not support image input. Switch to a vision-capable model...",
  "text_summary": "..."
}
RAW_BUFFERClick to expand / collapse

Description: When a non-vision model (DeepSeek, Qwen, etc.) calls computer_use with action='capture', the method _tool_result_content_for_active_model in run_agent.py returns a JSON object with an error field instead of silently falling back to the text summary (AX tree).

{
  "error": "computer_use returned screenshot/image content, but the active model/provider does not support image input. Switch to a vision-capable model...",
  "text_summary": "..."
}

The model sees the error message and gets confused — it complains about not being vision-capable instead of using the text summary it already received.

Expected behaviour: Non-vision models should receive the text summary (AX tree) silently, without the error block. The logger.warning that follows is sufficient for debugging.

Reproduction:

  1. Use a non-vision model (e.g. deepseek-v4-flash)
  2. Call computer_use(action='capture')
  3. Model receives JSON with error field and complains

Local fix: Removed the if tool_name == "computer_use": return json.dumps({error: ...}) block from _tool_result_content_for_active_model (~line 3892 in run_agent.py). The remaining code path returns the text summary directly.

v0.15.1 status: Still reproducible. The block was removed in a previous version but has been re-introduced.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix fix(run_agent): non-vision models get JSON error instead of text summary for computer_use captures [1 pull requests]