openclaw - 💡(How to fix) Fix 用户直接粘贴图片时,主模型不支持图片会导致图片被静默丢弃,且无法路由到 imageModel [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70212Fetched 2026-04-23 07:27:44
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Timeline (top)
commented ×1

Code Example

// openclaw.json 关键配置
{
  "models": {
    "providers": {
      "zai": {
        "baseUrl": "https://open.bigmodel.cn/api/coding/paas/v4",
        "api": "openai-completions",
        "models": [
          {
            "id": "glm-5-turbo",
            "input": ["text"],  // 纯文本模型
            "reasoning": true
          },
          {
            "id": "glm-4.6v",
            "input": ["text", "image"]  // 视觉模型
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "zai/glm-5-turbo" },
      "imageModel": { "primary": "zai/glm-4.6v" }
    }
  }
}
RAW_BUFFERClick to expand / collapse

问题描述

当用户在 webchat 中直接粘贴/上传图片时,如果当前 agent 的主模型(primary model)不支持图片输入(input 中不包含 "image"),图片会被静默丢弃parseMessageWithAttachments: 1 attachment(s) dropped — model does not support images),而不是自动路由到配置的 imageModel 进行识别。

更严重的是,如果主模型的 input 中标记了 "image" 但实际 API 不支持(例如 GLM-5-Turbo),图片会被发送给主模型并导致 400 错误,进而使整个对话 session 卡死,后续所有消息都无法正常响应。

复现步骤

场景 1:图片被静默丢弃

  1. 配置 agent 的主模型为纯文本模型(如 glm-5-turboinput: ["text"]
  2. 配置 agents.defaults.imageModel 为支持视觉的模型(如 zai/glm-4.6v
  3. 在 webchat 中直接粘贴一张图片并发送
  4. 实际结果:日志显示 parseMessageWithAttachments: 1 attachment(s) dropped — model does not support images,图片被丢弃,agent 收到的消息中没有任何图片内容
  5. 预期结果:OpenClaw 应自动将图片路由到 imageModel 进行识别,并将识别结果注入到 agent 的对话上下文中

场景 2:图片导致 session 卡死

  1. 配置 agent 的主模型 input 包含 "image",但模型 API 实际不支持图片(如 GLM-5-Turbo)
  2. 在 webchat 中直接粘贴一张图片并发送
  3. 实际结果:API 返回 400 错误,由于图片仍保留在对话历史中,后续所有消息都会触发同样的 400 错误,session 完全无法使用
  4. 预期结果:不应将图片发送给不支持的主模型;即使 API 调用失败,也不应导致 session 卡死

当前配置

// openclaw.json 关键配置
{
  "models": {
    "providers": {
      "zai": {
        "baseUrl": "https://open.bigmodel.cn/api/coding/paas/v4",
        "api": "openai-completions",
        "models": [
          {
            "id": "glm-5-turbo",
            "input": ["text"],  // 纯文本模型
            "reasoning": true
          },
          {
            "id": "glm-4.6v",
            "input": ["text", "image"]  // 视觉模型
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "zai/glm-5-turbo" },
      "imageModel": { "primary": "zai/glm-4.6v" }
    }
  }
}

源码分析

  • subsystem-Cgmckbux.js:324 中的 parseMessageWithAttachments 函数在发现主模型不支持图片时直接丢弃图片(attachment(s) dropped
  • runner-GjYg-C-v.js 中存在 resolveImageModelFromAgentDefaults 和 media-understanding 路由逻辑,但似乎只在 image 工具调用时生效,不会在用户直接粘贴图片时触发
  • 日志中可见 Skipping image understanding: primary model supports vision natively(当主模型标记支持图片时),以及 attachment(s) dropped(当主模型标记不支持图片时),但没有看到自动路由到 imageModel 的日志

环境信息

  • OpenClaw 版本:2026.4.15 (041266a)
  • Node.js:v24.14.0
  • 系统:macOS (Darwin 25.4.0, arm64)
  • 频道:webchat
  • 主模型:zai/glm-5-turbo(智谱 Coding Plan)

期望行为

  1. 自动路由:用户粘贴图片时,如果主模型不支持图片,应自动将图片发送到 imageModel 进行识别,将结果注入对话
  2. 优雅降级:即使 imageModel 也失败,也不应导致 session 卡死
  3. 用户提示:如果图片无法处理,应向用户明确提示,而不是静默丢弃

临时解决方案

目前只能在 agent 的 system prompt 中指导用户将图片保存到本地文件路径,然后 agent 通过 image 工具或 MCP 工具(如 zai-mcp-serveranalyze_image)手动读取。但这严重影响了用户体验。

extent analysis

TL;DR

Modify the parseMessageWithAttachments function to automatically route images to the imageModel when the primary model does not support images.

Guidance

  1. Update the parseMessageWithAttachments function: In subsystem-Cgmckbux.js, modify the function to check if the primary model supports images. If not, use the resolveImageModelFromAgentDefaults function to get the imageModel and route the image to it for processing.
  2. Implement a fallback for image processing failures: To prevent session freezes, add a try-catch block around the image processing code to handle any errors that may occur. If an error occurs, log the error and continue with the conversation.
  3. Provide user feedback for unsupported images: If an image cannot be processed, display a message to the user indicating that the image was not supported, rather than silently dropping it.

Example

// Modified parseMessageWithAttachments function
function parseMessageWithAttachments(message) {
  // ...
  if (!primaryModelSupportsImages) {
    const imageModel = resolveImageModelFromAgentDefaults();
    // Route the image to the imageModel for processing
    const imageResult = await processImageWithModel(imageModel, message.attachments[0]);
    // ...
  }
  // ...
}

Notes

The provided solution assumes that the resolveImageModelFromAgentDefaults function and the processImageWithModel function are already implemented and working correctly. Additional modifications may be necessary to handle specific error cases or edge conditions.

Recommendation

Apply the workaround by modifying the parseMessageWithAttachments function to automatically route images to the imageModel when the primary model does not support images. This will improve the user experience by providing a more seamless and robust image handling process.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix 用户直接粘贴图片时,主模型不支持图片会导致图片被静默丢弃,且无法路由到 imageModel [1 comments, 2 participants]