ollama - 💡(How to fix) Fix Anthropic compatibility: image content blocks are dropped when forwarded to vision-capable cloud models (v0.21.0) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15727Fetched 2026-04-22 07:43:46
View on GitHub
Comments
2
Participants
2
Timeline
2
Reactions
1
Timeline (top)
commented ×2

Code Example

curl -s http://localhost:11434/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "kimi-k2.6:cloud",
    "max_tokens": 200,
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this tiny test image in one sentence."},
        {"type": "image", "source": {"type": "base64", "media_type": "image/png",
         "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="}}
      ]
    }]
  }'
RAW_BUFFERClick to expand / collapse

Bug description

When using Ollama's Anthropic-compatible endpoint (POST /v1/messages) with a vision-capable cloud model such as kimi-k2.6:cloud, image content blocks in the request are silently dropped before being forwarded to the model. The model only receives the surrounding text, so it responds as if no image was attached.

This breaks Claude Code (and any other Anthropic-API client) when combined with tools that send screenshots — e.g. the computer-use MCP server — since the model has no idea an image was ever sent.

Environment

  • Ollama: v0.21.0 (confirmed via /api/version)
  • Model: kimi-k2.6:cloudollama show reports Capabilities: vision, thinking, completion, tools
  • OS: Windows 11
  • Client: curl (identical behavior seen via Claude Code CLI)

Reproduction

curl -s http://localhost:11434/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "kimi-k2.6:cloud",
    "max_tokens": 200,
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this tiny test image in one sentence."},
        {"type": "image", "source": {"type": "base64", "media_type": "image/png",
         "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="}}
      ]
    }]
  }'

Expected

Model receives the image bytes and describes it (or at minimum reports back that it was received).

Actual

The model's thinking output says:

"looking at the input, I don't actually see an image - I see the text [Image: but no actual image content loaded"

usage.input_tokens is 23 — far below what a real image would add — confirming the base64 payload never reaches the model. The image block appears to be replaced by a text placeholder during the Anthropic → Ollama format conversion.

Notes

  • The same Kimi model works fine with images when used through Moonshot's own Anthropic endpoint, so the model itself is not the issue.
  • Ollama's own OpenAI-compatibility layer (/v1/chat/completions) handles images correctly for other vision models, which suggests the Anthropic layer's request translator is just missing the image → images[] mapping.
  • DeepWiki's Anthropic compatibility layer doc lists image blocks as a recognized type in MessagesRequest but does not indicate whether they are actively processed — this report confirms they are not.

Suggested fix

In the Anthropic compatibility translator, when a user message contains an image content block of type: "base64", the base64 data should be appended to the images array of the corresponding Ollama /api/chat message (or stored in the multimodal buffer for vision models), instead of being stringified into a [Image: placeholder.

extent analysis

TL;DR

Modify the Anthropic compatibility translator in Ollama to properly handle image content blocks by appending base64 data to the images array.

Guidance

  • Review the Anthropic compatibility translator code to identify where the image content block is being stringified into a [Image: placeholder.
  • Update the translator to append the base64 data to the images array of the corresponding Ollama /api/chat message.
  • Verify that the model receives the image bytes by checking the usage.input_tokens value, which should be significantly higher than 23.
  • Test the updated translator with the provided curl command to ensure that the model correctly describes the image.

Example

// Example of the updated message structure
{
  "model": "kimi-k2.6:cloud",
  "max_tokens": 200,
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Describe this tiny test image in one sentence."},
      {"type": "image", "source": {"type": "base64", "media_type": "image/png",
       "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="}}
    ]
  }],
  "images": [
    {"type": "base64", "media_type": "image/png", "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="}
  ]
}

Notes

The provided curl command and expected output can be used to test the updated translator. The usage.input_tokens value can be used to verify that the image bytes are being received by the model.

Recommendation

Apply the suggested fix to the Anthropic compatibility translator to properly handle image content blocks and ensure that the model receives the image bytes. This should resolve the issue and allow the model to correctly describe the image.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING