openclaw - 💡(How to fix) Fix [Bug]: Telegram image understanding intermittently fails for ollama/qwen3.5:cloud, including false image hallucinations [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70103Fetched 2026-04-23 07:29:10
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
labeled ×2commented ×1

Telegram images sent to an Ollama cloud vision model are intermittently routed as text/fetch paths instead of multimodal vision content blocks, causing the agent to either deny image capability or hallucinate confident but false image descriptions.

Root Cause

Telegram images sent to an Ollama cloud vision model are intermittently routed as text/fetch paths instead of multimodal vision content blocks, causing the agent to either deny image capability or hallucinate confident but false image descriptions.

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Telegram images sent to an Ollama cloud vision model are intermittently routed as text/fetch paths instead of multimodal vision content blocks, causing the agent to either deny image capability or hallucinate confident but false image descriptions.

Steps to reproduce

  1. Configure an OpenClaw agent with ollama/qwen3.5:cloud as primary model (declared input: text, image)
  2. Set agents.defaults.imageModel.primary to openai-codex/gpt-5.4
  3. Start a fresh session (/new) with that agent via Telegram
  4. Send one Telegram image (e.g. a photo)
  5. Ask the agent: "Describe this image"
  6. Observe: agent either claims it cannot view images, or produces a confident but false description unrelated to the actual image content

Expected behavior

When I send an image in Telegram to an agent using ollama/qwen3.5:cloud, the model should receive actual image input and either:

• describe the image correctly, or • clearly state that image input was unavailable

It should not fabricate image contents.

Actual behavior

In fresh sessions, sending a Telegram image and asking “describe this image” produces intermittent failures:

  1. Sometimes the agent says it cannot view images at all
  2. Sometimes it hallucinates a false description of the image

Example false result:

• Actual image: photo of a young woman in a winery • Returned description: monochrome historical architectural scene / ruins / archaeological site

OpenClaw version

OpenClaw: 2026.4.15

Operating system

macOS arm64, Mac Mini M4 16GB

Install method

No response

Model

ollama/qwen3.5:cloud

Provider / routing chain

Telegram Bot API → OpenClaw gateway → Ollama provider → ollama/qwen3.5:cloud (remote) Image media path (observed failure): Telegram inbound → OpenClaw media download/resize → file saved to local disk → incorrectly routed as text/fetch path (web_fetch with file:/// URI) instead of multimodal vision content block

Additional provider/model setup details

• Agent: ollama/qwen3.5:cloud primary, ollama/qwen3.5:9b local fallback • Global image model override: openai-codex/gpt-5.4 • OpenClaw version: 2026.4.15 (041266a) • Host: macOS arm64, Mac Mini M4 16GB • qwen3.5:cloud is declared as input: ["text", "image"] in both global and per-agent model configs • Same setup produced working vision results in prior sessions, then failed again in fresh sessions • No per-agent image model override set; only the global agents.defaults.imageModel.primary

Logs, screenshots, and evidence

Impact and severity

• Affected users: Any OpenClaw user routing Telegram images to an Ollama cloud model for vision • Severity: Data risk. The model does not simply fail safe — it fabricates confident but false visual descriptions, which is worse than a silent failure • Frequency: Intermittent. Vision worked correctly in some sessions, then failed in fresh sessions under the same configuration • Consequence: Unreliable Telegram-based image interpretation; agents can hallucinate image contents instead of admitting missing input, creating a trust and safety problem for any workflow relying on vision

Additional information

No response

extent analysis

TL;DR

The issue can be resolved by ensuring that Telegram images are correctly routed as multimodal vision content blocks instead of text/fetch paths to the Ollama cloud model.

Guidance

  • Verify the configuration of the OpenClaw agent and the Ollama provider to ensure that images are declared as input and routed correctly.
  • Check the logs to see if there are any errors or warnings related to image processing or routing.
  • Test the setup with a different image model or provider to isolate the issue.
  • Review the OpenClaw version and provider/model setup details to ensure that they are compatible and up-to-date.

Example

No code snippet is provided as the issue seems to be related to configuration and routing rather than code.

Notes

The issue is intermittent and seems to be related to the routing of images from Telegram to the Ollama cloud model. The fact that it worked in prior sessions and then failed in fresh sessions suggests that there might be a configuration or compatibility issue.

Recommendation

Apply a workaround by verifying and adjusting the configuration of the OpenClaw agent and the Ollama provider to ensure correct image routing. This is recommended because the issue is intermittent and seems to be related to configuration rather than a version-specific bug.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When I send an image in Telegram to an agent using ollama/qwen3.5:cloud, the model should receive actual image input and either:

• describe the image correctly, or • clearly state that image input was unavailable

It should not fabricate image contents.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Telegram image understanding intermittently fails for ollama/qwen3.5:cloud, including false image hallucinations [1 comments, 2 participants]