openclaw - ✅(Solved) Fix Feature Request: Auto-switch to vision model when image attachment is detected [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71892Fetched 2026-04-27 05:37:42
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
commented ×2closed ×1cross-referenced ×1

When a user drags/drops or pastes an image into the chat, automatically switch to a configured vision-capable model instead of silently dropping the image or requiring manual intervention.

Root Cause

When a user drags/drops or pastes an image into the chat, automatically switch to a configured vision-capable model instead of silently dropping the image or requiring manual intervention.

Fix Action

Fixed

PR fix notes

PR #1: feat: auto-switch to vision model when image attachment is detected

Description (problem / solution / changelog)

Summary

When user drags/drops an image and the current model doesn't support images, automatically switch to the configured imageModel if available in agents.defaults.

Problem

Currently (per issue #49518), when a user attaches an image but the active model does not support image input:

  • The image is silently dropped by the gateway
  • The assistant replies as if no image was provided
  • No warning is shown to the user

This creates a poor UX where users must manually switch models using /model command.

Solution

This PR adds automatic vision model switching:

  1. When a message with image attachment is received
  2. And the current model doesn't support images
  3. Check if agents.defaults.imageModel is configured
  4. If yes, automatically switch to that vision model for processing

Changes

  • Modified src/gateway/server-methods/chat.ts:
    • Added import for normalizeModelSelection and parseModelRef
    • Added logic to check cfg.agents.defaults.imageModel
    • When image attachment detected but current model doesn't support images, automatically switch to imageModel
    • Added logging for model switch events

Configuration

Users can configure the fallback vision model in their openclaw.json:

{
  "agents": {
    "defaults": {
      "imageModel": "llama-3.2-vision"
    }
  }
}

Related Issues

  • Fixes #71892
  • Related to #49518

Backward Compatibility

This is purely additive - if imageModel is not configured, behavior remains unchanged.

Changed files

  • src/gateway/server-methods/chat.ts (modified, +27/-5)

Code Example

{
  "agents": {
    "defaults": {
      "visionFallbackModel": "llama-3.2-vision"
    }
  }
}
RAW_BUFFERClick to expand / collapse

Feature Request: Auto-switch to vision model when image attachment is detected

Summary

When a user drags/drops or pastes an image into the chat, automatically switch to a configured vision-capable model instead of silently dropping the image or requiring manual intervention.

Motivation

Currently (per issue #49518), when a user attaches an image but the active model does not support image input:

  1. The image is silently dropped by the gateway
  2. The assistant replies as if no image was provided
  3. No warning is shown to the user

This creates a poor user experience where:

  • The image appears to be attached in the UI
  • Sending succeeds
  • But the assistant cannot see the image
  • The user must manually switch models or use a command like /model vision-model

Proposed Solution

Add a new configuration option visionFallbackModel that:

  1. Specifies a vision-capable model to use when images are attached
  2. When an image attachment is detected and the current model does not support images:
    • Automatically routes the request to the configured vision model
    • Processes the image successfully
    • (Optionally) Returns to the original model after

Example configuration:

{
  "agents": {
    "defaults": {
      "visionFallbackModel": "llama-3.2-vision"
    }
  }
}

User Experience After Implementation

Before: User drags image → Image dropped silently → Assistant: "I cannot see the image" After: User drags image → Automatically switches to vision model → Image processed successfully

Alternative Solutions Considered

  1. User-facing warning at send time: Show a warning before sending if the model doesn't support images. But this still requires manual action.

  2. Always use vision model when available: Could be confusing if user wants text-only responses for text conversations.

  3. Per-conversation model setting: Allow users to set model per conversation. Adds complexity.

The auto-switch approach is the most seamless for the user.

Implementation Notes

Based on code review of parseMessageWithAttachments in src/gateway/server-methods/chat.ts:

  1. The function already checks opts?.supportsImages to determine if the model supports images
  2. When supportsImages === false and attachments exist, it currently just logs a warning and returns empty images
  3. The fix would add logic to:
    • Check if visionFallbackModel is configured
    • If yes, temporarily use that model for the request
    • Continue normal processing with the vision-capable model

Key files to modify:

  • src/gateway/server-methods/chat.ts — add visionFallbackModel logic in parseMessageWithAttachments or its callers
  • Configuration schema to accept the new visionFallbackModel option

Backward Compatibility

This is purely additive — if visionFallbackModel is not configured, behavior remains unchanged (images are dropped with warning for text-only models).

Priority

Medium-High — this is a significant UX gap that causes confusion and makes the image attachment feature effectively unusable for users with text-only primary models.


Related: #49518 (same underlying issue, this is a specific solution proposal)

extent analysis

TL;DR

Implement the proposed visionFallbackModel configuration option to automatically switch to a vision-capable model when an image attachment is detected.

Guidance

  • Modify the parseMessageWithAttachments function in src/gateway/server-methods/chat.ts to check for the visionFallbackModel configuration and temporarily use it when an image attachment is detected and the current model does not support images.
  • Update the configuration schema to accept the new visionFallbackModel option.
  • Test the implementation to ensure seamless switching to the vision model when an image is attached and the current model does not support images.
  • Verify that the original model is returned to after image processing, if desired.

Example

{
  "agents": {
    "defaults": {
      "visionFallbackModel": "llama-3.2-vision"
    }
  }
}

This example configuration specifies the llama-3.2-vision model as the fallback vision model.

Notes

The implementation should ensure backward compatibility by retaining the current behavior when visionFallbackModel is not configured.

Recommendation

Apply the proposed workaround by implementing the visionFallbackModel configuration option, as it provides a seamless user experience and addresses the significant UX gap caused by silently dropping images when the active model does not support image input.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING