hermes - 💡(How to fix) Fix Bug: `vision_analyze` tool fails to send images when using Docker terminal backend

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

| File Path | Error |

Root Cause

Probable Root Cause

Fix Action

Workaround

None identified. Direct API calls work, but there is no way to route the agent's image handling through a manual curl path.

Code Example

model:
  default: deepseek-v4-pro
  provider: deepseek
  base_url: https://api.deepseek.com/

auxiliary:
  vision:
    provider: alibaba
    model: qwen3.6-plus
    base_url: https://dashscope.aliyuncs.com/compatible-mode/v1
    api_key: sk-xxxxxxxx  # explicitly set, valid and tested

---

curl -s https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHS...KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-plus",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'"$IMG_B64"'"}}
      ]
    }]
  }'
# → Returns a valid description of the image content
RAW_BUFFERClick to expand / collapse

Environment

  • Hermes Agent version: latest (as of 2026-05-26)
  • Terminal backend: Docker (nikolaik/python-nodejs:python3.11-nodejs20)
  • Main model: DeepSeek V4 Pro (DeepSeek official API, text-only)
  • Auxiliary vision model: qwen3.6-plus (Alibaba DashScope / Bailian, domestic endpoint) — the recommended multimodal model on Bailian with native vision understanding

Configuration

model:
  default: deepseek-v4-pro
  provider: deepseek
  base_url: https://api.deepseek.com/

auxiliary:
  vision:
    provider: alibaba
    model: qwen3.6-plus
    base_url: https://dashscope.aliyuncs.com/compatible-mode/v1
    api_key: sk-xxxxxxxx  # explicitly set, valid and tested

Steps to Reproduce

  1. Configure an auxiliary vision model as above
  2. Send an image to the agent via a messaging platform (tested on Feishu)
  3. The agent receives the image and calls vision_analyze

Expected Behavior

The vision_analyze tool reads the image file, embeds it in the API request to the auxiliary vision model, and returns a description of the image content.

Actual Behavior

The tool fails in one of three ways depending on the file path, but never successfully delivers the image to the vision model:

File PathError
~/.hermes/image_cache/img_xxx.jpg (host path)API returns 200, but the model responds: "no image was attached or provided" — the image data is not embedded in the API request body
~/.hermes/cache/images/img_xxx.jpg (sandbox path, owned by non-root user, mode 600)[Errno 13] Permission denied
~/.hermes/media_cache/img_xxx.jpg (sandbox path, owned by root, mode 644)[Errno 13] Permission denied
/tmp/test_vision.jpg (sandbox path, owned by root, mode 666)Invalid image source. Provide an HTTP/HTTPS URL or a valid local file path.

Verification That the API & Model Work

A direct curl call to the DashScope API with the same model, same endpoint, same API key, and same image (base64-encoded) works correctly:

curl -s https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHS...KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-plus",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'"$IMG_B64"'"}}
      ]
    }]
  }'
# → Returns a valid description of the image content

This confirms the API key, endpoint, model, and image are all valid — the failure is in how Hermes handles the file and constructs the request.

Probable Root Cause

The vision_analyze tool appears to have these issues when running under the Docker backend:

  1. Path resolution: Host-side image cache paths are not correctly mapped to sandbox-accessible paths, or the file content is read from the wrong location
  2. File permission handling: Even when the file exists at a sandbox-accessible path with world-readable permissions, the tool either reports "Permission denied" or "Invalid image source", suggesting the file read logic has a user/permission mismatch with the Docker container's user context
  3. API request construction (most critical): When the tool does manage to make an API call (host path case), the image data is not embedded in the request body — the vision model receives a text-only message, confirming the image was never read or encoded into the request

Impact

Users with text-only main models (e.g., DeepSeek) who rely on auxiliary vision models for image understanding cannot use any image-based features when running the Docker backend. This breaks all image-related workflows: screenshot analysis, document scanning, visual QA, etc.

Workaround

None identified. Direct API calls work, but there is no way to route the agent's image handling through a manual curl path.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING