openclaw - ✅(Solved) Fix Bug: Sending photos via Telegram crashes session with ollama/glm-5.1 (text-only primary model) — provider rejects format error [1 pull requests, 1 participants]

Joel-Claw · 2026-04-16T09:45:43Z

[openclaw] PR 67634: fix: describe images via imageModel when primary model is text-only - Repository: openclaw/openclaw - Author: Joel-Claw - State: open | me… # PR #67634: fix: describe images via imageModel when primary model is text-only - Repository: openclaw/openclaw - Author: Joel-Claw - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/67634 ## Description (problem / solution / changelog) ## Problem When the primary model does not support images (e.g. `ollama/glm-5.1:cloud`, `ollama/glm-4:cloud`), sending a photo via Telegram **crashes the session and blocks all subsequent messages**, even pure text. Root cause: the text-only model rejects the image with an HTTP 400 → classified as `"format"` failure → auth profile cooldown activated (30s → 5min escalating) → **all** messages to that profile are rejected during cooldown, including normal text messages. Fixes #67606 Related: #51392, #59943, #66095, #39690 ## Root Cause Three issues stack: 1. **`parseMessageWithAttachments`** (`src/gateway/chat-attachments.ts`): When `supportsImages=false`, all attachments are dropped with a log warning. The images never reach the media store, so no downstream pipeline can process them. 2. **No imageModel fallback in Gateway path**: The `media-understanding` pipeline (`runCapability` in `runner.ts`) already knows how to describe images using `imageModel` when the primary model is text-only. But this pipeline runs in the `getReplyFromConfig` (auto-reply) path. The Gateway chat path never calls it, so images are just gone. 3. **Format error triggers profile-wide cooldown**: When the image is (incorrectly) sent to a text-only model, the provider rejects with HTTP 400 → classified as `"format"` → auth profile cooldown → ALL subsequent messages fail, even pure text. This is what makes the session "stuck" — one bad image blocks everything. ## Fix ### 1. Offload instead of drop (`chat-attachments.ts`) When `supportsImages=false`, `parseMessageWithAttachments` now **offloads** all image attachments to the media store and injects `media://` markers into the prompt. The `images` array is empty (no inline image blocks), but `offloadedRefs` and `imageOrder` are populated. ### 2. Describe offloaded images via imageModel (`describeOffloadedImagesForTextOnlyModel`) New exported function in `chat-attachments.ts` that: 1. Resolves the `imageModel` from config using `resolveAutoImageModel` 2. For each offloaded ref, resolves the physical file path from the media store 3. Calls `describeImageFileWithModel` to describe the image via the vision-capable imageModel 4. Replaces `[media attached: media://inbound/ ]` markers with `[attached image: ]` If no `imageModel` is configured, or if description fails, the original `media://` marker is preserved (graceful fallback). ### 3. Call sites updated All three `parseMessageWithAttachments` call sites now call `describeOffloadedImagesForTextOnlyModel` when `supportsImages=false` and there are offloaded refs. ## Example **Before:** Send photo → image dropped OR leaks through → HTTP 400 "format" → profile cooldown → session stuck, ALL messages fail **After:** Send photo → offloaded → described via imageModel → `[attached image: A cat sitting on a windowsill]` → text-only model reasons about content → no format error → session works normally ## Checklist - [x] Code compiles without new TypeScript errors - [x] Tests added for new behavior - [x] Graceful fallback when `imageModel` is not configured - [x] Graceful fallback when image description fails (original marker preserved) - [x] No changes to behavior when `supportsImages=true` ## Changed files - `src/gateway/chat-attachments.test.ts` (modified, +72/-0) - `src/gateway/chat-attachments.ts` (modified, +136/-23) - `src/gateway/media-understanding-describe.runtime.ts` (added, +11/-0) - `src/gateway/server-methods/agent.ts` (modified, +25/-5) - `src/gateway/server-methods/chat.ts` (modified, +22/-5) - `src/gateway/server-node-events.runtime.ts` (modified, +2/-2) - `src/gateway/server-node-events.ts` (modified, +20/-3) ## Workaround Currently, the only workaround is to send images as files (not inline photos) and use the `image` tool to analyze them separately. ## Bug Description When sending a photo via Telegram to an agent whose primary model is text-only (e.g., `ollama/glm-5.1:cloud`), the entire session crashes with a provider format error. The user receives no response and the session becomes unresponsive until reset. This is a specific, user-facing manifestation of two known upstream bugs combining: 1. **#62292** — Telegram inbound images use the describer-only pipeline instead of native vision blocks. Images never reach the model as actual image data. 2. **#39690 / #66253** — User-configured `input: ["text", "image"]` is ignored by the hardcoded model catalog, so even `imageModel` fallback fails for non-catalog models. ## Reproduction 1. Configure primary model as text-only: `ollama/glm-5.1:cloud` (inpu

openclaw2026-04-16 09:45:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#67606•Fetched 2026-04-17 08:30:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Joel-Claw

Participants

Joel-Claw

Timeline (top)

cross-referenced ×1referenced ×1

Error Message

When sending a photo via Telegram to an agent whose primary model is text-only (e.g., ollama/glm-5.1:cloud), the entire session crashes with a provider format error. The user receives no response and the session becomes unresponsive until reset. 4. Session crashes with format error

Error Log

2026-04-16T08:59:53.351Z warn agent/embedded auth_profile_failure_state_updated provider="ollama" reason="format" errorCount=1 cooldownUntil=1776330023332

Provider rejects with format error

Root Cause

The image is downloaded and resized successfully, but then the request to the primary model (GLM-5.1) fails because GLM-5.1 is text-only. The auth profile goes into cooldown, killing the session.

Fix Action

Workaround

Currently, the only workaround is to send images as files (not inline photos) and use the image tool to analyze them separately.

PR fix notes

PR #67634: fix: describe images via imageModel when primary model is text-only

Repository: openclaw/openclaw
Author: Joel-Claw
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/67634

Description (problem / solution / changelog)

Problem

When the primary model does not support images (e.g. ollama/glm-5.1:cloud, ollama/glm-4:cloud), sending a photo via Telegram crashes the session and blocks all subsequent messages, even pure text.

Root cause: the text-only model rejects the image with an HTTP 400 → classified as "format" failure → auth profile cooldown activated (30s → 5min escalating) → all messages to that profile are rejected during cooldown, including normal text messages.

Fixes #67606

Related: #51392, #59943, #66095, #39690

Root Cause

Three issues stack:

parseMessageWithAttachments (src/gateway/chat-attachments.ts): When supportsImages=false, all attachments are dropped with a log warning. The images never reach the media store, so no downstream pipeline can process them.
No imageModel fallback in Gateway path: The media-understanding pipeline (runCapability in runner.ts) already knows how to describe images using imageModel when the primary model is text-only. But this pipeline runs in the getReplyFromConfig (auto-reply) path. The Gateway chat path never calls it, so images are just gone.
Format error triggers profile-wide cooldown: When the image is (incorrectly) sent to a text-only model, the provider rejects with HTTP 400 → classified as "format" → auth profile cooldown → ALL subsequent messages fail, even pure text. This is what makes the session "stuck" — one bad image blocks everything.

Fix

1. Offload instead of drop (`chat-attachments.ts`)

When supportsImages=false, parseMessageWithAttachments now offloads all image attachments to the media store and injects media:// markers into the prompt. The images array is empty (no inline image blocks), but offloadedRefs and imageOrder are populated.

2. Describe offloaded images via imageModel (`describeOffloadedImagesForTextOnlyModel`)

New exported function in chat-attachments.ts that:

Resolves the imageModel from config using resolveAutoImageModel
For each offloaded ref, resolves the physical file path from the media store
Calls describeImageFileWithModel to describe the image via the vision-capable imageModel
Replaces [media attached: media://inbound/<id>] markers with [attached image: <description>]

If no imageModel is configured, or if description fails, the original media:// marker is preserved (graceful fallback).

3. Call sites updated

All three parseMessageWithAttachments call sites now call describeOffloadedImagesForTextOnlyModel when supportsImages=false and there are offloaded refs.

Example

Before: Send photo → image dropped OR leaks through → HTTP 400 "format" → profile cooldown → session stuck, ALL messages fail

After: Send photo → offloaded → described via imageModel → [attached image: A cat sitting on a windowsill] → text-only model reasons about content → no format error → session works normally

Checklist

Code compiles without new TypeScript errors
Tests added for new behavior
Graceful fallback when imageModel is not configured
Graceful fallback when image description fails (original marker preserved)
No changes to behavior when supportsImages=true

Changed files

src/gateway/chat-attachments.test.ts (modified, +72/-0)
src/gateway/chat-attachments.ts (modified, +136/-23)
src/gateway/media-understanding-describe.runtime.ts (added, +11/-0)
src/gateway/server-methods/agent.ts (modified, +25/-5)
src/gateway/server-methods/chat.ts (modified, +22/-5)
src/gateway/server-node-events.runtime.ts (modified, +2/-2)
src/gateway/server-node-events.ts (modified, +20/-3)

Code Example

2026-04-16T08:59:52.948Z info agents/tool-images Image resized to fit limits: 1280x960px 175.3KB -> 172.3KB (-1.7%)
2026-04-16T08:59:53.351Z warn agent/embedded auth_profile_failure_state_updated provider="ollama" reason="format" errorCount=1 cooldownUntil=1776330023332

RAW_BUFFERClick to expand / collapse

Bug Description

This is a specific, user-facing manifestation of two known upstream bugs combining:

#62292 — Telegram inbound images use the describer-only pipeline instead of native vision blocks. Images never reach the model as actual image data.
#39690 / #66253 — User-configured input: ["text", "image"] is ignored by the hardcoded model catalog, so even imageModel fallback fails for non-catalog models.

Reproduction

Configure primary model as text-only: ollama/glm-5.1:cloud (input: ["text"])
Configure imageModel: ollama/kimi-k2.5:cloud (input: ["text", "image"])
Send a photo via Telegram
Session crashes with format error

Error Log

2026-04-16T08:59:52.948Z info agents/tool-images Image resized to fit limits: 1280x960px 175.3KB -> 172.3KB (-1.7%)
2026-04-16T08:59:53.351Z warn agent/embedded auth_profile_failure_state_updated provider="ollama" reason="format" errorCount=1 cooldownUntil=1776330023332

The image is downloaded and resized successfully, but then the request to the primary model (GLM-5.1) fails because GLM-5.1 is text-only. The auth profile goes into cooldown, killing the session.

Expected Behavior

When the primary model is text-only:

Images from Telegram should be routed to the configured imageModel (kimi-k2.5:cloud)
The session should NOT crash — a graceful fallback should occur
At minimum, the image should be described and the description passed to the primary model as text

Actual Behavior

Image is sent to the text-only primary model
Provider rejects with format error
Auth profile enters cooldown
Session becomes unresponsive
User gets no response and has to /new

Workaround

Currently, the only workaround is to send images as files (not inline photos) and use the image tool to analyze them separately.

Environment

OpenClaw: v2026.4.x (current)
Channel: Telegram
Primary model: ollama/glm-5.1:cloud (text-only)
Image model: ollama/kimi-k2.5:cloud (vision-capable)
OS: Linux (Raspberry Pi 5, ARM64)

Related Issues

#62292 — Telegram images use describer-only path, never native vision blocks
#39690 — Multimodal messages ignored for custom Ollama models (catalog whitelist overrides user config)
#66253 — parseMessageWithAttachments drops images for models despite input: ["text", "image"] config
#59943 — image tool "Unknown model" for configured Ollama models
#51392 — GLM provider does not register media-understanding capability
#65211 — Fix PR (closed without merge) to include user-configured models in gateway catalog

extent analysis

TL;DR

To fix the session crash when sending photos to a text-only primary model, ensure that images are routed to the configured imageModel instead of the primary model.

Guidance

Verify that the imageModel is correctly configured and vision-capable, such as ollama/kimi-k2.5:cloud.
Check the model catalog to ensure that user-configured models are not ignored, as indicated by issues #39690 and #66253.
Consider using the workaround of sending images as files instead of inline photos and analyzing them separately with the image tool.
Review the error log to confirm that the issue is related to the primary model being text-only and the image not being routed to the imageModel.

Example

No code snippet is provided as the issue is related to model configuration and routing.

Notes

The fix may require addressing the upstream bugs #62292, #39690, and #66253, which are related to Telegram image handling and model configuration.

Recommendation

Apply the workaround of sending images as files instead of inline photos, as a permanent fix may require resolving the upstream bugs and updating the model catalog.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt issue #agent setup #task chaining #parallel task #integration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Bug: Sending photos via Telegram crashes session with ollama/glm-5.1 (text-only primary model) — provider rejects format error [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Log

Root Cause

Fix Action

Workaround

PR fix notes

PR #67634: fix: describe images via imageModel when primary model is text-only

Description (problem / solution / changelog)

Problem

Root Cause

Fix

1. Offload instead of drop (chat-attachments.ts)

2. Describe offloaded images via imageModel (describeOffloadedImagesForTextOnlyModel)

3. Call sites updated

Example

Checklist

Changed files

Code Example

Bug Description

Reproduction

Error Log

Expected Behavior

Actual Behavior

Workaround

Environment

Related Issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. Offload instead of drop (`chat-attachments.ts`)

2. Describe offloaded images via imageModel (`describeOffloadedImagesForTextOnlyModel`)