hermes - 💡(How to fix) Fix Bug: `vision_analyze` tool fails to send images when using Docker terminal backend

Code Example

model:
  default: deepseek-v4-pro
  provider: deepseek
  base_url: https://api.deepseek.com/

auxiliary:
  vision:
    provider: alibaba
    model: qwen3.6-plus
    base_url: https://dashscope.aliyuncs.com/compatible-mode/v1
    api_key: sk-xxxxxxxx  # explicitly set, valid and tested

---

curl -s https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHS...KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-plus",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'"$IMG_B64"'"}}
      ]
    }]
  }'
# → Returns a valid description of the image content

Environment

Hermes Agent version: latest (as of 2026-05-26)
Terminal backend: Docker (nikolaik/python-nodejs:python3.11-nodejs20)
Main model: DeepSeek V4 Pro (DeepSeek official API, text-only)
Auxiliary vision model: qwen3.6-plus (Alibaba DashScope / Bailian, domestic endpoint) — the recommended multimodal model on Bailian with native vision understanding

Configuration

model:
  default: deepseek-v4-pro
  provider: deepseek
  base_url: https://api.deepseek.com/

auxiliary:
  vision:
    provider: alibaba
    model: qwen3.6-plus
    base_url: https://dashscope.aliyuncs.com/compatible-mode/v1
    api_key: sk-xxxxxxxx  # explicitly set, valid and tested

Steps to Reproduce

Configure an auxiliary vision model as above
Send an image to the agent via a messaging platform (tested on Feishu)
The agent receives the image and calls vision_analyze

Expected Behavior

The vision_analyze tool reads the image file, embeds it in the API request to the auxiliary vision model, and returns a description of the image content.

Actual Behavior

The tool fails in one of three ways depending on the file path, but never successfully delivers the image to the vision model:

File Path	Error
`~/.hermes/image_cache/img_xxx.jpg` (host path)	API returns 200, but the model responds: "no image was attached or provided" — the image data is not embedded in the API request body
`~/.hermes/cache/images/img_xxx.jpg` (sandbox path, owned by non-root user, mode 600)	`[Errno 13] Permission denied`
`~/.hermes/media_cache/img_xxx.jpg` (sandbox path, owned by `root`, mode 644)	`[Errno 13] Permission denied`
`/tmp/test_vision.jpg` (sandbox path, owned by `root`, mode 666)	`Invalid image source. Provide an HTTP/HTTPS URL or a valid local file path.`

Verification That the API & Model Work

A direct curl call to the DashScope API with the same model, same endpoint, same API key, and same image (base64-encoded) works correctly:

curl -s https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHS...KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-plus",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'"$IMG_B64"'"}}
      ]
    }]
  }'
# → Returns a valid description of the image content

This confirms the API key, endpoint, model, and image are all valid — the failure is in how Hermes handles the file and constructs the request.

Probable Root Cause

The vision_analyze tool appears to have these issues when running under the Docker backend:

Path resolution: Host-side image cache paths are not correctly mapped to sandbox-accessible paths, or the file content is read from the wrong location
File permission handling: Even when the file exists at a sandbox-accessible path with world-readable permissions, the tool either reports "Permission denied" or "Invalid image source", suggesting the file read logic has a user/permission mismatch with the Docker container's user context
API request construction (most critical): When the tool does manage to make an API call (host path case), the image data is not embedded in the request body — the vision model receives a text-only message, confirming the image was never read or encoded into the request

Impact

Users with text-only main models (e.g., DeepSeek) who rely on auxiliary vision models for image understanding cannot use any image-based features when running the Docker backend. This breaks all image-related workflows: screenshot analysis, document scanning, visual QA, etc.

Workaround

None identified. Direct API calls work, but there is no way to route the agent's image handling through a manual curl path.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Bug: `vision_analyze` tool fails to send images when using Docker terminal backend

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Probable Root Cause

Fix Action

Workaround

Code Example

Environment

Configuration

Steps to Reproduce

Expected Behavior

Actual Behavior

Verification That the API & Model Work

Probable Root Cause

Impact

Workaround

Still need to ship something?

TRENDING