openclaw - 💡(How to fix) Fix Telegram image attachments stored but not passed to LLM as vision input [1 comments, 2 participants]

openclaw2026-04-21 18:28:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#69808•Fetched 2026-04-22 07:48:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

cjagwani

Participants

cjagwani

rafiki270

Timeline (top)

commented ×1cross-referenced ×1

When a Telegram user sends an image to a bot running on OpenClaw, the image file is downloaded and written to /sandbox/.openclaw-data/media/inbound/ correctly, but the agent loop does not include the image content in the LLM request. The model receives only the text portion of the message, and responds with generic disclaimers like "I can't analyze images directly" or "I don't have image recognition or computer vision abilities built in."

Error Message

If the active model is text-only, the agent should return a clear "the current model doesn't support vision input" error rather than hallucinating generic disclaimers.

Root Cause

Code Example

ls /sandbox/.openclaw-data/media/inbound/

---

file_0---d2f7df5c-bade-41cd-a3f8-83456fb7e98f.jpg
  file_1---1981cc25-ae97-422e-a38f-971126e0b69b.jpg

---

{
  "role": "user",
  "content": [
    {"type": "text", "text": "..."},
    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
  ]
}

RAW_BUFFERClick to expand / collapse

Summary

Environment

OpenClaw: 2026.4.2 (d74a122)
Node.js: v22.22.2
npm: 10.9.7
Running inside NVIDIA NemoClaw sandbox (v0.0.18) on OpenShell CLI 0.0.26
Host: Brev (Linux, Ubuntu)
Channel: Telegram

Reproduction Steps

Configure OpenClaw with Telegram enabled and a running bot (any standard onboarding).
From Telegram, send the bot a JPEG or PNG image.
Wait up to 30 seconds for the reply.

Inspect the media path:

ls /sandbox/.openclaw-data/media/inbound/

Actual Result

Agent text reply: "I can't analyze images directly" / "I don't have image recognition or computer vision abilities built in."

Image files are present on disk:

file_0---d2f7df5c-bade-41cd-a3f8-83456fb7e98f.jpg
file_1---1981cc25-ae97-422e-a38f-971126e0b69b.jpg

No MediaFetchError or network-policy errors in logs (I confirmed the sandbox's egress policy allows /file/bot*/** so download works).

Expected Result

If a vision-capable inference model is configured, the agent should include the downloaded image as a vision content block in the chat completion request — e.g.:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "..."},
    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
  ]
}

If the active model is text-only, the agent should return a clear "the current model doesn't support vision input" error rather than hallucinating generic disclaimers.

Boundary analysis (why this is an OpenClaw issue, not NemoClaw)

I work on NVIDIA NemoClaw. This was originally filed against us at NVIDIA/NemoClaw#2009. I traced the chain:

Step	Location	Status
Sandbox receives Telegram event	OpenClaw channel code	✓ works
Download attachment via `api.telegram.org/file/bot/*`	OpenClaw → host auth proxy	✓ works (NemoClaw's `telegram.yaml` policy allows it)
Write to `/sandbox/.openclaw-data/media/inbound/`	OpenClaw	✓ works (NemoClaw configures writable dir)
Pass image to LLM as vision input	OpenClaw agent loop	✗ missing

I grep'd NemoClaw and confirmed there is zero code in NemoClaw that touches image/vision/multimodal handling. All the failing logic lives in OpenClaw's agent pipeline.

Tracking

Upstream: NVIDIA/NemoClaw#2009 (closing there with a pointer here)

extent analysis

TL;DR

The OpenClaw agent loop likely needs to be updated to include the downloaded image content in the LLM request as a vision content block.

Guidance

Verify that the image files are being downloaded and written to the correct directory (/sandbox/.openclaw-data/media/inbound/) and that the file paths are correctly formatted.
Check the OpenClaw agent loop code to ensure it is properly handling the image files and including them in the LLM request as vision content blocks.
Review the configuration of the vision-capable inference model to ensure it is correctly set up and enabled.
Test the agent loop with a text-only model to verify that it returns a clear error message indicating that the model does not support vision input.

Example

{
  "role": "user",
  "content": [
    {"type": "text", "text": "..."},
    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
  ]
}

This example shows how the image content should be included in the LLM request.

Notes

The issue seems to be specific to the OpenClaw agent loop and not related to NemoClaw. The fact that the image files are being downloaded and written to the correct directory suggests that the issue is with how the agent loop is handling the image files.

Recommendation

Apply a workaround to update the OpenClaw agent loop to include the image content in the LLM request. This will likely involve modifying the agent loop code to properly handle the image files and include them in the request as vision content blocks.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Telegram image attachments stored but not passed to LLM as vision input [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Reproduction Steps

Actual Result

Expected Result

Boundary analysis (why this is an OpenClaw issue, not NemoClaw)

Tracking

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Telegram image attachments stored but not passed to LLM as vision input [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Reproduction Steps

Actual Result

Expected Result

Boundary analysis (why this is an OpenClaw issue, not NemoClaw)

Tracking

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING