openclaw - 💡(How to fix) Fix Feature Request: Ollama Media-Understanding Provider [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56246Fetched 2026-04-08 01:43:08
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
1
Timeline (top)
commented ×1

OpenClaw supports api: "ollama" as a valid model API type, but does not have a media-understanding provider for Ollama. This causes the image tool to fail when using Ollama vision models.

Error Message

// Returns undefined for 'ollama' → throws error

Root Cause

OpenClaw has two separate systems:

  1. Model API Types (api: "ollama" | "openai-completions" | etc.) - Defines how to call text models
  2. Media-Understanding Providers - Separate plugins that implement describeImage, transcribeAudio, etc.

The api: "ollama" is valid for the schema, but there's no ollamaMediaUnderstandingProvider registered.

Fix Action

Fix / Workaround

Workaround (Current)

Code Example

{
  "defaults": {
    "imageModel": {
      "primary": "ollama/qwen3-vl:235b-cloud",
      "fallbacks": []
    }
  }
}

---

[tools] image failed: All image models failed (2): 
ollama/qwen3-vl: No media-understanding provider registered for ollama

---

const PROVIDERS = [groqMediaUnderstandingProvider, deepgramMediaUnderstandingProvider];

function buildMediaUnderstandingRegistry(overrides, cfg) {
  const registry = new Map();
  for (const provider of PROVIDERS) mergeProviderIntoRegistry(registry, provider);
  // Load plugins...
  for (const entry of pluginRegistry?.mediaUnderstandingProviders ?? []) 
    mergeProviderIntoRegistry(registry, entry.provider);
}

function getMediaUnderstandingProvider(id, registry) {
  return registry.get(normalizeMediaProviderId(id));
  // Returns undefined for 'ollama' → throws error
}

---

// extensions/ollama/media-understanding-provider.ts

const DEFAULT_OLLAMA_BASE_URL = "http://127.0.0.1:11434";
const DEFAULT_OLLAMA_VISION_MODEL = "llava";

async function describeOllamaImage(params: DescribeImageParams) {
  const baseUrl = normalizeBaseUrl(params.baseUrl, DEFAULT_OLLAMA_BASE_URL);
  const model = params.model?.trim() || DEFAULT_OLLAMA_VISION_MODEL;
  
  const response = await fetch(`${baseUrl}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model,
      messages: [{
        role: "user",
        content: params.prompt || "Describe this image",
        images: [params.buffer.toString("base64")]
      }]
    })
  });
  
  // Parse response...
}

const ollamaMediaUnderstandingProvider = {
  id: "ollama",
  capabilities: ["image"],
  describeImage: describeOllamaImage,
  describeImages: describeOllamaImages
};
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw supports api: "ollama" as a valid model API type, but does not have a media-understanding provider for Ollama. This causes the image tool to fail when using Ollama vision models.

Current Behavior

When configuring imageModel with an Ollama vision model:

{
  "defaults": {
    "imageModel": {
      "primary": "ollama/qwen3-vl:235b-cloud",
      "fallbacks": []
    }
  }
}

The image tool fails with:

[tools] image failed: All image models failed (2): 
ollama/qwen3-vl: No media-understanding provider registered for ollama

Root Cause Analysis

OpenClaw has two separate systems:

  1. Model API Types (api: "ollama" | "openai-completions" | etc.) - Defines how to call text models
  2. Media-Understanding Providers - Separate plugins that implement describeImage, transcribeAudio, etc.

The api: "ollama" is valid for the schema, but there's no ollamaMediaUnderstandingProvider registered.

Registered Media-Understanding Providers

Provider IDFile
googlemedia-understanding-provider-DllGbg-v.js
anthropicmedia-understanding-provider-DKEqPSf8.js
openaimedia-understanding-provider-D4Ek4YHa.js
minimaxmedia-understanding-provider-CxBraAgD.js
moonshotmedia-understanding-provider-BDq4rFL-.js
mistralmedia-understanding-provider-DGRgnApZ.js
zaimedia-understanding-provider--kzrfOQX.js

Ollama is missing.

Code Reference

From src/media-understanding/provider-registry.ts:

const PROVIDERS = [groqMediaUnderstandingProvider, deepgramMediaUnderstandingProvider];

function buildMediaUnderstandingRegistry(overrides, cfg) {
  const registry = new Map();
  for (const provider of PROVIDERS) mergeProviderIntoRegistry(registry, provider);
  // Load plugins...
  for (const entry of pluginRegistry?.mediaUnderstandingProviders ?? []) 
    mergeProviderIntoRegistry(registry, entry.provider);
}

function getMediaUnderstandingProvider(id, registry) {
  return registry.get(normalizeMediaProviderId(id));
  // Returns undefined for 'ollama' → throws error
}

Proposed Solution

Add an ollamaMediaUnderstandingProvider that uses Ollama's /api/chat or /api/generate endpoint with vision models.

Example Implementation

// extensions/ollama/media-understanding-provider.ts

const DEFAULT_OLLAMA_BASE_URL = "http://127.0.0.1:11434";
const DEFAULT_OLLAMA_VISION_MODEL = "llava";

async function describeOllamaImage(params: DescribeImageParams) {
  const baseUrl = normalizeBaseUrl(params.baseUrl, DEFAULT_OLLAMA_BASE_URL);
  const model = params.model?.trim() || DEFAULT_OLLAMA_VISION_MODEL;
  
  const response = await fetch(`${baseUrl}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model,
      messages: [{
        role: "user",
        content: params.prompt || "Describe this image",
        images: [params.buffer.toString("base64")]
      }]
    })
  });
  
  // Parse response...
}

const ollamaMediaUnderstandingProvider = {
  id: "ollama",
  capabilities: ["image"],
  describeImage: describeOllamaImage,
  describeImages: describeOllamaImages
};

Benefits

  1. Full Ollama support - Text + Vision in one provider
  2. Local vision - No API keys needed for local models like llava, moondream, qwen3-vl
  3. Cloud vision - Works with qwen3-vl:235b-cloud, minimax-m2.5:cloud via Ollama cloud
  4. Consistency - Same provider for text and vision

Environment

  • OpenClaw version: 2026.3.23-2 / 2026.3.24
  • Node.js: v24.14.0
  • OS: Windows 10

Workaround (Current)

Use Browser + Gemini web for OCR of handwritten content, or use a different provider (Google, Anthropic) for vision when API quota is available.

extent analysis

Fix Plan

To resolve the issue, we need to add an ollamaMediaUnderstandingProvider that uses Ollama's API with vision models. Here are the steps:

  • Create a new file ollamaMediaUnderstandingProvider.ts in the extensions/ollama directory with the following code:
// extensions/ollama/media-understanding-provider.ts

const DEFAULT_OLLAMA_BASE_URL = "http://127.0.0.1:11434";
const DEFAULT_OLLAMA_VISION_MODEL = "llava";

async function describeOllamaImage(params: DescribeImageParams) {
  const baseUrl = normalizeBaseUrl(params.baseUrl, DEFAULT_OLLAMA_BASE_URL);
  const model = params.model?.trim() || DEFAULT_OLLAMA_VISION_MODEL;
  
  const response = await fetch(`${baseUrl}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model,
      messages: [{
        role: "user",
        content: params.prompt || "Describe this image",
        images: [params.buffer.toString("base64")]
      }]
    })
  });
  
  // Parse response...
}

const ollamaMediaUnderstandingProvider = {
  id: "ollama",
  capabilities: ["image"],
  describeImage: describeOllamaImage,
  describeImages: describeOllamaImages
};

export default ollamaMediaUnderstandingProvider;
  • Register the new provider in src/media-understanding/provider-registry.ts by adding the following line:
import ollamaMediaUnderstandingProvider from '../extensions/ollama/media-understanding-provider';
const PROVIDERS = [groqMediaUnderstandingProvider, deepgramMediaUnderstandingProvider, ollamaMediaUnderstandingProvider];
  • Update the buildMediaUnderstandingRegistry function to include the new provider:
function buildMediaUnderstandingRegistry(overrides, cfg) {
  const registry = new Map();
  for (const provider of PROVIDERS) mergeProviderIntoRegistry(registry, provider);
  // Load plugins...
  for (const entry of pluginRegistry?.mediaUnderstandingProviders ?? []) 
    mergeProviderIntoRegistry(registry, entry.provider);
}

Verification

To verify that the fix worked, try using the image tool with an Ollama vision model:

{
  "defaults": {
    "imageModel": {
      "primary": "ollama/qwen3-vl:235b-cloud",
      "fallbacks": []
    }
  }
}

The image tool should no longer fail with the "No media-understanding provider registered for ollama" error.

Extra Tips

  • Make sure to update the ollamaMediaUnderstandingProvider to handle any errors that may occur when calling the Ollama API.
  • Consider adding additional logging or debugging statements to help diagnose any issues

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Feature Request: Ollama Media-Understanding Provider [1 comments, 2 participants]