openclaw - ✅(Solved) Fix Bug: Sending photos via Telegram crashes session with ollama/glm-5.1 (text-only primary model) — provider rejects format error [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#67606Fetched 2026-04-17 08:30:05
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1referenced ×1

Error Message

When sending a photo via Telegram to an agent whose primary model is text-only (e.g., ollama/glm-5.1:cloud), the entire session crashes with a provider format error. The user receives no response and the session becomes unresponsive until reset. 4. Session crashes with format error

Error Log

2026-04-16T08:59:53.351Z warn agent/embedded auth_profile_failure_state_updated provider="ollama" reason="format" errorCount=1 cooldownUntil=1776330023332

  • Provider rejects with format error

Root Cause

The image is downloaded and resized successfully, but then the request to the primary model (GLM-5.1) fails because GLM-5.1 is text-only. The auth profile goes into cooldown, killing the session.

Fix Action

Workaround

Currently, the only workaround is to send images as files (not inline photos) and use the image tool to analyze them separately.

PR fix notes

PR #67634: fix: describe images via imageModel when primary model is text-only

Description (problem / solution / changelog)

Problem

When the primary model does not support images (e.g. ollama/glm-5.1:cloud, ollama/glm-4:cloud), sending a photo via Telegram crashes the session and blocks all subsequent messages, even pure text.

Root cause: the text-only model rejects the image with an HTTP 400 → classified as "format" failure → auth profile cooldown activated (30s → 5min escalating) → all messages to that profile are rejected during cooldown, including normal text messages.

Fixes #67606

Related: #51392, #59943, #66095, #39690

Root Cause

Three issues stack:

  1. parseMessageWithAttachments (src/gateway/chat-attachments.ts): When supportsImages=false, all attachments are dropped with a log warning. The images never reach the media store, so no downstream pipeline can process them.

  2. No imageModel fallback in Gateway path: The media-understanding pipeline (runCapability in runner.ts) already knows how to describe images using imageModel when the primary model is text-only. But this pipeline runs in the getReplyFromConfig (auto-reply) path. The Gateway chat path never calls it, so images are just gone.

  3. Format error triggers profile-wide cooldown: When the image is (incorrectly) sent to a text-only model, the provider rejects with HTTP 400 → classified as "format" → auth profile cooldown → ALL subsequent messages fail, even pure text. This is what makes the session "stuck" — one bad image blocks everything.

Fix

1. Offload instead of drop (chat-attachments.ts)

When supportsImages=false, parseMessageWithAttachments now offloads all image attachments to the media store and injects media:// markers into the prompt. The images array is empty (no inline image blocks), but offloadedRefs and imageOrder are populated.

2. Describe offloaded images via imageModel (describeOffloadedImagesForTextOnlyModel)

New exported function in chat-attachments.ts that:

  1. Resolves the imageModel from config using resolveAutoImageModel
  2. For each offloaded ref, resolves the physical file path from the media store
  3. Calls describeImageFileWithModel to describe the image via the vision-capable imageModel
  4. Replaces [media attached: media://inbound/<id>] markers with [attached image: <description>]

If no imageModel is configured, or if description fails, the original media:// marker is preserved (graceful fallback).

3. Call sites updated

All three parseMessageWithAttachments call sites now call describeOffloadedImagesForTextOnlyModel when supportsImages=false and there are offloaded refs.

Example

Before: Send photo → image dropped OR leaks through → HTTP 400 "format" → profile cooldown → session stuck, ALL messages fail

After: Send photo → offloaded → described via imageModel → [attached image: A cat sitting on a windowsill] → text-only model reasons about content → no format error → session works normally

Checklist

  • Code compiles without new TypeScript errors
  • Tests added for new behavior
  • Graceful fallback when imageModel is not configured
  • Graceful fallback when image description fails (original marker preserved)
  • No changes to behavior when supportsImages=true

Changed files

  • src/gateway/chat-attachments.test.ts (modified, +72/-0)
  • src/gateway/chat-attachments.ts (modified, +136/-23)
  • src/gateway/media-understanding-describe.runtime.ts (added, +11/-0)
  • src/gateway/server-methods/agent.ts (modified, +25/-5)
  • src/gateway/server-methods/chat.ts (modified, +22/-5)
  • src/gateway/server-node-events.runtime.ts (modified, +2/-2)
  • src/gateway/server-node-events.ts (modified, +20/-3)

Code Example

2026-04-16T08:59:52.948Z info agents/tool-images Image resized to fit limits: 1280x960px 175.3KB -> 172.3KB (-1.7%)
2026-04-16T08:59:53.351Z warn agent/embedded auth_profile_failure_state_updated provider="ollama" reason="format" errorCount=1 cooldownUntil=1776330023332
RAW_BUFFERClick to expand / collapse

Bug Description

When sending a photo via Telegram to an agent whose primary model is text-only (e.g., ollama/glm-5.1:cloud), the entire session crashes with a provider format error. The user receives no response and the session becomes unresponsive until reset.

This is a specific, user-facing manifestation of two known upstream bugs combining:

  1. #62292 — Telegram inbound images use the describer-only pipeline instead of native vision blocks. Images never reach the model as actual image data.
  2. #39690 / #66253 — User-configured input: ["text", "image"] is ignored by the hardcoded model catalog, so even imageModel fallback fails for non-catalog models.

Reproduction

  1. Configure primary model as text-only: ollama/glm-5.1:cloud (input: ["text"])
  2. Configure imageModel: ollama/kimi-k2.5:cloud (input: ["text", "image"])
  3. Send a photo via Telegram
  4. Session crashes with format error

Error Log

2026-04-16T08:59:52.948Z info agents/tool-images Image resized to fit limits: 1280x960px 175.3KB -> 172.3KB (-1.7%)
2026-04-16T08:59:53.351Z warn agent/embedded auth_profile_failure_state_updated provider="ollama" reason="format" errorCount=1 cooldownUntil=1776330023332

The image is downloaded and resized successfully, but then the request to the primary model (GLM-5.1) fails because GLM-5.1 is text-only. The auth profile goes into cooldown, killing the session.

Expected Behavior

When the primary model is text-only:

  • Images from Telegram should be routed to the configured imageModel (kimi-k2.5:cloud)
  • The session should NOT crash — a graceful fallback should occur
  • At minimum, the image should be described and the description passed to the primary model as text

Actual Behavior

  • Image is sent to the text-only primary model
  • Provider rejects with format error
  • Auth profile enters cooldown
  • Session becomes unresponsive
  • User gets no response and has to /new

Workaround

Currently, the only workaround is to send images as files (not inline photos) and use the image tool to analyze them separately.

Environment

  • OpenClaw: v2026.4.x (current)
  • Channel: Telegram
  • Primary model: ollama/glm-5.1:cloud (text-only)
  • Image model: ollama/kimi-k2.5:cloud (vision-capable)
  • OS: Linux (Raspberry Pi 5, ARM64)

Related Issues

  • #62292 — Telegram images use describer-only path, never native vision blocks
  • #39690 — Multimodal messages ignored for custom Ollama models (catalog whitelist overrides user config)
  • #66253 — parseMessageWithAttachments drops images for models despite input: ["text", "image"] config
  • #59943 — image tool "Unknown model" for configured Ollama models
  • #51392 — GLM provider does not register media-understanding capability
  • #65211 — Fix PR (closed without merge) to include user-configured models in gateway catalog

extent analysis

TL;DR

To fix the session crash when sending photos to a text-only primary model, ensure that images are routed to the configured imageModel instead of the primary model.

Guidance

  • Verify that the imageModel is correctly configured and vision-capable, such as ollama/kimi-k2.5:cloud.
  • Check the model catalog to ensure that user-configured models are not ignored, as indicated by issues #39690 and #66253.
  • Consider using the workaround of sending images as files instead of inline photos and analyzing them separately with the image tool.
  • Review the error log to confirm that the issue is related to the primary model being text-only and the image not being routed to the imageModel.

Example

No code snippet is provided as the issue is related to model configuration and routing.

Notes

The fix may require addressing the upstream bugs #62292, #39690, and #66253, which are related to Telegram image handling and model configuration.

Recommendation

Apply the workaround of sending images as files instead of inline photos, as a permanent fix may require resolving the upstream bugs and updating the model catalog.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING