openclaw - ✅(Solved) Fix Feature Request: Auto-switch to vision model when image attachment is detected [1 pull requests, 2 comments, 2 participants]

openclaw2026-04-26 02:13:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#71892•Fetched 2026-04-27 05:37:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Guo-bird

Participants

clawsweeper[bot]

Guo-bird

Timeline (top)

commented ×2closed ×1cross-referenced ×1

When a user drags/drops or pastes an image into the chat, automatically switch to a configured vision-capable model instead of silently dropping the image or requiring manual intervention.

Root Cause

When a user drags/drops or pastes an image into the chat, automatically switch to a configured vision-capable model instead of silently dropping the image or requiring manual intervention.

Fix Action

Fixed

Fixed by PR: feat: auto-switch to vision model when image attachment is detected (https://github.com/Guo-bird/openclaw/pull/1)

PR fix notes

PR #1: feat: auto-switch to vision model when image attachment is detected

Repository: Guo-bird/openclaw
Author: Guo-bird
State: open | merged: False
Link: https://github.com/Guo-bird/openclaw/pull/1

Description (problem / solution / changelog)

Summary

When user drags/drops an image and the current model doesn't support images, automatically switch to the configured imageModel if available in agents.defaults.

Problem

Currently (per issue #49518), when a user attaches an image but the active model does not support image input:

The image is silently dropped by the gateway
The assistant replies as if no image was provided
No warning is shown to the user

This creates a poor UX where users must manually switch models using /model command.

Solution

This PR adds automatic vision model switching:

When a message with image attachment is received
And the current model doesn't support images
Check if agents.defaults.imageModel is configured
If yes, automatically switch to that vision model for processing

Changes

Modified src/gateway/server-methods/chat.ts:
- Added import for normalizeModelSelection and parseModelRef
- Added logic to check cfg.agents.defaults.imageModel
- When image attachment detected but current model doesn't support images, automatically switch to imageModel
- Added logging for model switch events

Configuration

Users can configure the fallback vision model in their openclaw.json:

{
  "agents": {
    "defaults": {
      "imageModel": "llama-3.2-vision"
    }
  }
}

Related Issues

Fixes #71892
Related to #49518

Backward Compatibility

This is purely additive - if imageModel is not configured, behavior remains unchanged.

Changed files

src/gateway/server-methods/chat.ts (modified, +27/-5)

Code Example

{
  "agents": {
    "defaults": {
      "visionFallbackModel": "llama-3.2-vision"
    }
  }
}

RAW_BUFFERClick to expand / collapse

Feature Request: Auto-switch to vision model when image attachment is detected

Summary

When a user drags/drops or pastes an image into the chat, automatically switch to a configured vision-capable model instead of silently dropping the image or requiring manual intervention.

Motivation

Currently (per issue #49518), when a user attaches an image but the active model does not support image input:

The image is silently dropped by the gateway
The assistant replies as if no image was provided
No warning is shown to the user

This creates a poor user experience where:

The image appears to be attached in the UI
Sending succeeds
But the assistant cannot see the image
The user must manually switch models or use a command like /model vision-model

Proposed Solution

Add a new configuration option visionFallbackModel that:

Specifies a vision-capable model to use when images are attached
When an image attachment is detected and the current model does not support images:
- Automatically routes the request to the configured vision model
- Processes the image successfully
- (Optionally) Returns to the original model after

Example configuration:

{
  "agents": {
    "defaults": {
      "visionFallbackModel": "llama-3.2-vision"
    }
  }
}

User Experience After Implementation

Before: User drags image → Image dropped silently → Assistant: "I cannot see the image" After: User drags image → Automatically switches to vision model → Image processed successfully

Alternative Solutions Considered

User-facing warning at send time: Show a warning before sending if the model doesn't support images. But this still requires manual action.
Always use vision model when available: Could be confusing if user wants text-only responses for text conversations.
Per-conversation model setting: Allow users to set model per conversation. Adds complexity.

The auto-switch approach is the most seamless for the user.

Implementation Notes

Based on code review of parseMessageWithAttachments in src/gateway/server-methods/chat.ts:

The function already checks opts?.supportsImages to determine if the model supports images
When supportsImages === false and attachments exist, it currently just logs a warning and returns empty images
The fix would add logic to:
- Check if visionFallbackModel is configured
- If yes, temporarily use that model for the request
- Continue normal processing with the vision-capable model

Key files to modify:

src/gateway/server-methods/chat.ts — add visionFallbackModel logic in parseMessageWithAttachments or its callers
Configuration schema to accept the new visionFallbackModel option

Backward Compatibility

This is purely additive — if visionFallbackModel is not configured, behavior remains unchanged (images are dropped with warning for text-only models).

Priority

Medium-High — this is a significant UX gap that causes confusion and makes the image attachment feature effectively unusable for users with text-only primary models.

Related: #49518 (same underlying issue, this is a specific solution proposal)

extent analysis

TL;DR

Implement the proposed visionFallbackModel configuration option to automatically switch to a vision-capable model when an image attachment is detected.

Guidance

Modify the parseMessageWithAttachments function in src/gateway/server-methods/chat.ts to check for the visionFallbackModel configuration and temporarily use it when an image attachment is detected and the current model does not support images.
Update the configuration schema to accept the new visionFallbackModel option.
Test the implementation to ensure seamless switching to the vision model when an image is attached and the current model does not support images.
Verify that the original model is returned to after image processing, if desired.

Example

{
  "agents": {
    "defaults": {
      "visionFallbackModel": "llama-3.2-vision"
    }
  }
}

This example configuration specifies the llama-3.2-vision model as the fallback vision model.

Notes

The implementation should ensure backward compatibility by retaining the current behavior when visionFallbackModel is not configured.

Recommendation

Apply the proposed workaround by implementing the visionFallbackModel configuration option, as it provides a seamless user experience and addresses the significant UX gap caused by silently dropping images when the active model does not support image input.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#container setup #orchestration issue #cache issue #memory leak #API versioning

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Feature Request: Auto-switch to vision model when image attachment is detected [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #1: feat: auto-switch to vision model when image attachment is detected

Description (problem / solution / changelog)

Summary

Problem

Solution

Changes

Configuration

Related Issues

Backward Compatibility

Changed files

Code Example

Feature Request: Auto-switch to vision model when image attachment is detected

Summary

Motivation

Proposed Solution

User Experience After Implementation

Alternative Solutions Considered

Implementation Notes

Backward Compatibility

Priority

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING