openclaw - 💡(How to fix) Fix [Bug]: Ollama models with vision capability not recognized as supporting images [1 comments, 2 participants]

openclaw2026-04-07 13:32:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62519•Fetched 2026-04-08 03:03:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

suffamoph

Participants

BruceMacD

suffamoph

Timeline (top)

labeled ×2closed ×1commented ×1

Root Cause

buildOllamaModelDefinition (in compiled dist: stream-DaZ9JB7F.js) hardcodes input: ["text"] for all Ollama models:

function buildOllamaModelDefinition(modelId, contextWindow) {
    return {
        input: ["text"],  // ← always text-only, never checks vision
        ...
    };
}
Additionally, enrichOllamaModelsWithContext already calls Ollama's /api/show for each model to get context_length, but ignores the capabilities field which contains "vision".

The mergeProviderModels function always overwrites input with the implicit (auto-discovered) value, so manual edits to models.json are also lost on every startup.

Suggested Fix
When calling /api/show in the enrichment phase, also read data.capabilities
If capabilities includes "vision", set input: ["text", "image"]
Pass supportsVision through enrichOllamaModelsWithContext → buildOllamaModelDefinition
No extra HTTP requests needed — the context window enrichment already hits /api/show.
***********************************************************************************************
Environment:
OpenClaw version: 2026.4.5
Ollama version: 0.17.1
Affected models: any Ollama model with vision capability

Code Example

function buildOllamaModelDefinition(modelId, contextWindow) {
    return {
        input: ["text"],  // ← always text-only, never checks vision
        ...
    };
}
Additionally, enrichOllamaModelsWithContext already calls Ollama's /api/show for each model to get context_length, but ignores the capabilities field which contains "vision".

The mergeProviderModels function always overwrites input with the implicit (auto-discovered) value, so manual edits to models.json are also lost on every startup.

Suggested Fix
When calling /api/show in the enrichment phase, also read data.capabilities
If capabilities includes "vision", set input: ["text", "image"]
Pass supportsVision through enrichOllamaModelsWithContext → buildOllamaModelDefinition
No extra HTTP requests needed — the context window enrichment already hits /api/show.
***********************************************************************************************
Environment:
OpenClaw version: 2026.4.5
Ollama version: 0.17.1
Affected models: any Ollama model with vision capability

### Steps to reproduce

1/Using ollama local models that we all know for certain that has "vision" capavilities;
2/Attatch an image in a message
3/Openclaw acts like it never seen it. ”I can not find any attached image in your message“

### Expected behavior

Ollama models that support vision capability should be able to parse images.

### Actual behavior

When using local Ollama models that support `vision` capability (e.g. `qwen3.5:35b`, `gemma4:31b`), OpenClaw drops all image attachments with the warning:

[gateway] parseMessageWithAttachments: 1 attachment(s) dropped — model does not support images

### OpenClaw version

2026.4.5

### Operating system

Windows11

### Install method

npm global

### Model

any Ollama model with vision capability

### Provider / routing chain

ollama

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Bug Description

When using local Ollama models that support vision capability (e.g. qwen3.5:35b, gemma4:31b), OpenClaw drops all image attachments with the warning:

[gateway] parseMessageWithAttachments: 1 attachment(s) dropped — model does not support images

The model correctly reports vision support via ollama show:

Capabilities completion vision tools thinking

Root Cause

buildOllamaModelDefinition (in compiled dist: stream-DaZ9JB7F.js) hardcodes input: ["text"] for all Ollama models:

function buildOllamaModelDefinition(modelId, contextWindow) {
    return {
        input: ["text"],  // ← always text-only, never checks vision
        ...
    };
}
Additionally, enrichOllamaModelsWithContext already calls Ollama's /api/show for each model to get context_length, but ignores the capabilities field which contains "vision".

The mergeProviderModels function always overwrites input with the implicit (auto-discovered) value, so manual edits to models.json are also lost on every startup.

Suggested Fix
When calling /api/show in the enrichment phase, also read data.capabilities
If capabilities includes "vision", set input: ["text", "image"]
Pass supportsVision through enrichOllamaModelsWithContext → buildOllamaModelDefinition
No extra HTTP requests needed — the context window enrichment already hits /api/show.
***********************************************************************************************
Environment:
OpenClaw version: 2026.4.5
Ollama version: 0.17.1
Affected models: any Ollama model with vision capability

### Steps to reproduce

1/Using ollama local models that we all know for certain that has "vision" capavilities;
2/Attatch an image in a message
3/Openclaw acts like it never seen it. ”I can not find any attached image in your message“

### Expected behavior

Ollama models that support vision capability should be able to parse images.

### Actual behavior

When using local Ollama models that support `vision` capability (e.g. `qwen3.5:35b`, `gemma4:31b`), OpenClaw drops all image attachments with the warning:

[gateway] parseMessageWithAttachments: 1 attachment(s) dropped — model does not support images

### OpenClaw version

2026.4.5

### Operating system

Windows11

### Install method

npm global

### Model

any Ollama model with vision capability

### Provider / routing chain

ollama

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

Modify the buildOllamaModelDefinition function to check the model's capabilities and set input to ["text", "image"] if the model supports vision.

Guidance

Check the capabilities field in the response from Ollama's /api/show endpoint to determine if the model supports vision.
Pass the supportsVision flag from enrichOllamaModelsWithContext to buildOllamaModelDefinition to set the correct input type.
Update the buildOllamaModelDefinition function to conditionally set input to ["text", "image"] if the model supports vision.
Verify that the mergeProviderModels function does not overwrite the updated input value.

Example

function buildOllamaModelDefinition(modelId, contextWindow, supportsVision) {
    const input = supportsVision ? ["text", "image"] : ["text"];
    return {
        input,
        // ...
    };
}

Notes

This fix assumes that the enrichOllamaModelsWithContext function is correctly calling Ollama's /api/show endpoint and parsing the response. Additional logging or debugging may be necessary to verify this.

Recommendation

Apply the suggested fix to update the buildOllamaModelDefinition function to correctly handle models with vision capability. This will allow OpenClaw to parse image attachments for models that support vision.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Ollama models that support vision capability should be able to parse images.

#api #vector store #embedding generation #cache error #pipeline error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Ollama models with vision capability not recognized as supporting images [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug type

Beta release blocker

Summary

Bug Description

Root Cause

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Ollama models with vision capability not recognized as supporting images [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug type

Beta release blocker

Summary

Bug Description

Root Cause

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING