hermes - 💡(How to fix) Fix Discord mixed attachments can send non-image data URLs as input_image, causing Responses 400 invalid image [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

Code Example

The image data you provided does not represent a valid image. Please check your input and try again with one of the supported image formats: ['image/jpeg', 'image/png', 'image/gif', 'image/webp'].

---

data:image/jpeg;base64,...

---

50 4b 03 04 ... word/numbering.xml

---

{
  "type": "input_image",
  "image_url": "data:image/jpeg;base64,UEsDBBQACAgIA..."
}

---

data:image/jpeg

---

PK\x03\x04
RAW_BUFFERClick to expand / collapse

Bug Description

When a Discord message contains multiple valid image attachments plus a non-image document attachment, Hermes can serialize the document as a Responses input_image part if the attachment is classified as image-like before provider submission.

That produces an inline data:image/* URL whose declared MIME type says image, but whose decoded bytes are not an image.

Responses-compatible providers reject the whole request with HTTP 400, for example:

The image data you provided does not represent a valid image. Please check your input and try again with one of the supported image formats: ['image/jpeg', 'image/png', 'image/gif', 'image/webp'].

In the failing payload, the valid screenshots were encoded correctly. One additional input_image part was declared as:

data:image/jpeg;base64,...

but the decoded bytes started with a ZIP/DOCX signature:

50 4b 03 04 ... word/numbering.xml

That single invalid image part caused the entire multimodal request to fail.

Steps to Reproduce

  1. Use the Discord gateway with a Responses-compatible vision-capable provider.
  2. Send a Discord message with:
    • at least one valid image attachment, such as JPEG or PNG
    • one document attachment, such as a .docx file
  3. Ensure the document attachment is routed into the multimodal content list as an input_image / image_url data URL with an image/* MIME type.
  4. Send the request to the provider.

A minimal failing content part looks like this:

{
  "type": "input_image",
  "image_url": "data:image/jpeg;base64,UEsDBBQACAgIA..."
}

UEsDB... decodes to PK\x03\x04, which is a ZIP/DOCX signature, not JPEG bytes.

Expected Behavior

Hermes should validate inline data:image/* payloads before sending them to Responses-compatible providers.

If a payload is declared as image/jpeg, image/png, image/gif, or image/webp, but the decoded bytes do not match that format, Hermes should not send it as an input_image.

It should either:

  • skip the invalid image part and log a warning, or
  • route the attachment through the document handling path if available.

Valid image attachments in the same message should still be sent.

Actual Behavior

Hermes sends the invalid data:image/jpeg payload as an input_image part.

The provider rejects the entire request with HTTP 400 invalid_value, so the agent cannot answer the Discord message even though the other image attachments are valid.

Impact

A single misclassified non-image attachment can break an otherwise valid Discord multimodal request.

This affects workflows where users send screenshots together with supporting documents in the same Discord message.

Suggested Fix

Add a preflight guard before building or sending Responses input_image parts.

For inline data:image/* URLs:

  1. Base64-decode the payload.
  2. Validate the decoded bytes against the declared MIME type:
    • JPEG: FF D8 FF
    • PNG: 89 50 4E 47 0D 0A 1A 0A
    • GIF: GIF87a or GIF89a
    • WebP: RIFF....WEBP
  3. If validation fails, skip the image part and log a warning.
  4. Leave remote HTTP(S) image URLs unchanged unless there is already a fetch/validation layer.

A good central location appears to be the Responses content conversion path, for example agent/codex_responses_adapter.py around _chat_content_to_responses_parts(), so the guard protects all callers that pass inline data URLs to Responses-compatible providers.

Environment

  • Platform: Discord gateway
  • Provider type: Responses-compatible multimodal provider
  • Attachment mix: valid image attachments plus a document attachment

Privacy Note

This report intentionally omits private message contents, local paths, identifiers, and real attachment names.

The reproducible signal is only the MIME/magic-byte mismatch:

data:image/jpeg

wrapping bytes that start with:

PK\x03\x04

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Discord mixed attachments can send non-image data URLs as input_image, causing Responses 400 invalid image [2 pull requests]