openclaw - 💡(How to fix) Fix Feature Request: Ollama Media-Understanding Provider [1 comments, 2 participants]

openclaw2026-03-28 07:02:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#56246•Fetched 2026-04-08 01:43:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

anabellbot-pixel

Participants

anabellbot-pixel

mlaihk

Timeline (top)

commented ×1

OpenClaw supports api: "ollama" as a valid model API type, but does not have a media-understanding provider for Ollama. This causes the image tool to fail when using Ollama vision models.

Error Message

// Returns undefined for 'ollama' → throws error

Root Cause

OpenClaw has two separate systems:

Model API Types (api: "ollama" | "openai-completions" | etc.) - Defines how to call text models
Media-Understanding Providers - Separate plugins that implement describeImage, transcribeAudio, etc.

The api: "ollama" is valid for the schema, but there's no ollamaMediaUnderstandingProvider registered.

Fix Action

Fix / Workaround

Workaround (Current)

Code Example

{
  "defaults": {
    "imageModel": {
      "primary": "ollama/qwen3-vl:235b-cloud",
      "fallbacks": []
    }
  }
}

---

[tools] image failed: All image models failed (2): 
ollama/qwen3-vl: No media-understanding provider registered for ollama

---

const PROVIDERS = [groqMediaUnderstandingProvider, deepgramMediaUnderstandingProvider];

function buildMediaUnderstandingRegistry(overrides, cfg) {
  const registry = new Map();
  for (const provider of PROVIDERS) mergeProviderIntoRegistry(registry, provider);
  // Load plugins...
  for (const entry of pluginRegistry?.mediaUnderstandingProviders ?? []) 
    mergeProviderIntoRegistry(registry, entry.provider);
}

function getMediaUnderstandingProvider(id, registry) {
  return registry.get(normalizeMediaProviderId(id));
  // Returns undefined for 'ollama' → throws error
}

---

// extensions/ollama/media-understanding-provider.ts

const DEFAULT_OLLAMA_BASE_URL = "http://127.0.0.1:11434";
const DEFAULT_OLLAMA_VISION_MODEL = "llava";

async function describeOllamaImage(params: DescribeImageParams) {
  const baseUrl = normalizeBaseUrl(params.baseUrl, DEFAULT_OLLAMA_BASE_URL);
  const model = params.model?.trim() || DEFAULT_OLLAMA_VISION_MODEL;
  
  const response = await fetch(`${baseUrl}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model,
      messages: [{
        role: "user",
        content: params.prompt || "Describe this image",
        images: [params.buffer.toString("base64")]
      }]
    })
  });
  
  // Parse response...
}

const ollamaMediaUnderstandingProvider = {
  id: "ollama",
  capabilities: ["image"],
  describeImage: describeOllamaImage,
  describeImages: describeOllamaImages
};

RAW_BUFFERClick to expand / collapse

Summary

OpenClaw supports api: "ollama" as a valid model API type, but does not have a media-understanding provider for Ollama. This causes the image tool to fail when using Ollama vision models.

Current Behavior

When configuring imageModel with an Ollama vision model:

{
  "defaults": {
    "imageModel": {
      "primary": "ollama/qwen3-vl:235b-cloud",
      "fallbacks": []
    }
  }
}

The image tool fails with:

[tools] image failed: All image models failed (2): 
ollama/qwen3-vl: No media-understanding provider registered for ollama

Root Cause Analysis

OpenClaw has two separate systems:

Model API Types (api: "ollama" | "openai-completions" | etc.) - Defines how to call text models
Media-Understanding Providers - Separate plugins that implement describeImage, transcribeAudio, etc.

The api: "ollama" is valid for the schema, but there's no ollamaMediaUnderstandingProvider registered.

Registered Media-Understanding Providers

Provider ID	File
google	`media-understanding-provider-DllGbg-v.js`
anthropic	`media-understanding-provider-DKEqPSf8.js`
openai	`media-understanding-provider-D4Ek4YHa.js`
minimax	`media-understanding-provider-CxBraAgD.js`
moonshot	`media-understanding-provider-BDq4rFL-.js`
mistral	`media-understanding-provider-DGRgnApZ.js`
zai	`media-understanding-provider--kzrfOQX.js`

Ollama is missing.

Code Reference

From src/media-understanding/provider-registry.ts:

const PROVIDERS = [groqMediaUnderstandingProvider, deepgramMediaUnderstandingProvider];

function buildMediaUnderstandingRegistry(overrides, cfg) {
  const registry = new Map();
  for (const provider of PROVIDERS) mergeProviderIntoRegistry(registry, provider);
  // Load plugins...
  for (const entry of pluginRegistry?.mediaUnderstandingProviders ?? []) 
    mergeProviderIntoRegistry(registry, entry.provider);
}

function getMediaUnderstandingProvider(id, registry) {
  return registry.get(normalizeMediaProviderId(id));
  // Returns undefined for 'ollama' → throws error
}

Proposed Solution

Add an ollamaMediaUnderstandingProvider that uses Ollama's /api/chat or /api/generate endpoint with vision models.

Example Implementation

// extensions/ollama/media-understanding-provider.ts

const DEFAULT_OLLAMA_BASE_URL = "http://127.0.0.1:11434";
const DEFAULT_OLLAMA_VISION_MODEL = "llava";

async function describeOllamaImage(params: DescribeImageParams) {
  const baseUrl = normalizeBaseUrl(params.baseUrl, DEFAULT_OLLAMA_BASE_URL);
  const model = params.model?.trim() || DEFAULT_OLLAMA_VISION_MODEL;
  
  const response = await fetch(`${baseUrl}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model,
      messages: [{
        role: "user",
        content: params.prompt || "Describe this image",
        images: [params.buffer.toString("base64")]
      }]
    })
  });
  
  // Parse response...
}

const ollamaMediaUnderstandingProvider = {
  id: "ollama",
  capabilities: ["image"],
  describeImage: describeOllamaImage,
  describeImages: describeOllamaImages
};

Benefits

Full Ollama support - Text + Vision in one provider
Local vision - No API keys needed for local models like llava, moondream, qwen3-vl
Cloud vision - Works with qwen3-vl:235b-cloud, minimax-m2.5:cloud via Ollama cloud
Consistency - Same provider for text and vision

Environment

OpenClaw version: 2026.3.23-2 / 2026.3.24
Node.js: v24.14.0
OS: Windows 10

Workaround (Current)

Use Browser + Gemini web for OCR of handwritten content, or use a different provider (Google, Anthropic) for vision when API quota is available.

extent analysis

Fix Plan

To resolve the issue, we need to add an ollamaMediaUnderstandingProvider that uses Ollama's API with vision models. Here are the steps:

Create a new file ollamaMediaUnderstandingProvider.ts in the extensions/ollama directory with the following code:

// extensions/ollama/media-understanding-provider.ts

const DEFAULT_OLLAMA_BASE_URL = "http://127.0.0.1:11434";
const DEFAULT_OLLAMA_VISION_MODEL = "llava";

async function describeOllamaImage(params: DescribeImageParams) {
  const baseUrl = normalizeBaseUrl(params.baseUrl, DEFAULT_OLLAMA_BASE_URL);
  const model = params.model?.trim() || DEFAULT_OLLAMA_VISION_MODEL;
  
  const response = await fetch(`${baseUrl}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model,
      messages: [{
        role: "user",
        content: params.prompt || "Describe this image",
        images: [params.buffer.toString("base64")]
      }]
    })
  });
  
  // Parse response...
}

const ollamaMediaUnderstandingProvider = {
  id: "ollama",
  capabilities: ["image"],
  describeImage: describeOllamaImage,
  describeImages: describeOllamaImages
};

export default ollamaMediaUnderstandingProvider;

import ollamaMediaUnderstandingProvider from '../extensions/ollama/media-understanding-provider';
const PROVIDERS = [groqMediaUnderstandingProvider, deepgramMediaUnderstandingProvider, ollamaMediaUnderstandingProvider];

Update the buildMediaUnderstandingRegistry function to include the new provider:

function buildMediaUnderstandingRegistry(overrides, cfg) {
  const registry = new Map();
  for (const provider of PROVIDERS) mergeProviderIntoRegistry(registry, provider);
  // Load plugins...
  for (const entry of pluginRegistry?.mediaUnderstandingProviders ?? []) 
    mergeProviderIntoRegistry(registry, entry.provider);
}

Verification

To verify that the fix worked, try using the image tool with an Ollama vision model:

{
  "defaults": {
    "imageModel": {
      "primary": "ollama/qwen3-vl:235b-cloud",
      "fallbacks": []
    }
  }
}

The image tool should no longer fail with the "No media-understanding provider registered for ollama" error.

Extra Tips

Make sure to update the ollamaMediaUnderstandingProvider to handle any errors that may occur when calling the Ollama API.
Consider adding additional logging or debugging statements to help diagnose any issues

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #mixed precision #training loop #device allocation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Feature Request: Ollama Media-Understanding Provider [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround (Current)

Code Example

Summary

Current Behavior

Root Cause Analysis

Registered Media-Understanding Providers

Code Reference

Proposed Solution

Example Implementation

Benefits

Environment

Workaround (Current)

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Feature Request: Ollama Media-Understanding Provider [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround (Current)

Code Example

Summary

Current Behavior

Root Cause Analysis

Registered Media-Understanding Providers

Code Reference

Proposed Solution

Example Implementation

Benefits

Environment

Workaround (Current)

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING