langchain - ✅(Solved) Fix core: _convert_openai_format_to_data_block hard-codes mime_type="application/pdf" on base64 file blocks [1 pull requests, 1 comments, 2 participants]

langchain2026-04-22 06:49:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

langchain-ai/langchain#36938•Fetched 2026-04-23 07:23:11

View on GitHub

Comments

Participants

Timeline

Reactions

Author

anmolg1997

Participants

anmolg1997

langchain-automated-triage[bot]

Timeline (top)

closed ×1commented ×1cross-referenced ×1labeled ×1

In libs/core/langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two symmetrical base64 branches: one for images (image_url) and one for files.

The image branch correctly reads the MIME type from the parsed data URI:

# base64-style image block
if (block["type"] == "image_url") and (
    parsed := _parse_data_uri(block["image_url"]["url"])
):
    ...
    return types.create_image_block(
        base64=parsed["data"],
        mime_type=parsed["mime_type"],   # from data URI
        **all_extras,
    )

The file branch hard-codes PDF, discarding the parsed value:

# base64-style file block
if (block["type"] == "file") and (
    parsed := _parse_data_uri(block["file"]["file_data"])
):
    ...
    return types.create_file_block(
        base64=parsed["data"],
        mime_type="application/pdf",     # hard-coded
        filename=filename,
        **all_extras,
    )

Effect: any non-PDF file delivered via the OpenAI base64 file block shape (CSV, plain text, spreadsheets, office docs, etc.) is silently relabeled as application/pdf on the way into v1 content blocks. Since _normalize_messages calls this translator on every chat model's _astream, the wrong MIME type propagates to every downstream integration that consumes content_blocks.

_parse_data_uri already guarantees mime_type is non-empty whenever parsed is truthy (returns None otherwise), so the fix is a one-line change: use parsed["mime_type"] like the image branch does. No extra None check needed.

I've opened PR #36937 with the one-line fix and a regression test covering CSV and plain-text base64 file blocks. The PR was auto-closed for the missing-issue-link check. Happy to reopen once this issue is approved and assigned.

Found while tracing a separate production crash (PDF attachments through langchain-litellm to Anthropic-via-Vertex). This is a secondary correctness bug caught in passing.

Error Message

Error Message and Stack Trace (if applicable)

No exception. The output is:

Root Cause

In libs/core/langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two symmetrical base64 branches: one for images (image_url) and one for files.

The image branch correctly reads the MIME type from the parsed data URI:

# base64-style image block
if (block["type"] == "image_url") and (
    parsed := _parse_data_uri(block["image_url"]["url"])
):
    ...
    return types.create_image_block(
        base64=parsed["data"],
        mime_type=parsed["mime_type"],   # from data URI
        **all_extras,
    )

The file branch hard-codes PDF, discarding the parsed value:

# base64-style file block
if (block["type"] == "file") and (
    parsed := _parse_data_uri(block["file"]["file_data"])
):
    ...
    return types.create_file_block(
        base64=parsed["data"],
        mime_type="application/pdf",     # hard-coded
        filename=filename,
        **all_extras,
    )

Found while tracing a separate production crash (PDF attachments through langchain-litellm to Anthropic-via-Vertex). This is a secondary correctness bug caught in passing.

Fix Action

Fixed

Fixed by PR: core[patch]: preserve MIME type on base64 file blocks in openai translator (https://github.com/langchain-ai/langchain/pull/36937)

PR fix notes

PR #36937: core[patch]: preserve MIME type on base64 file blocks in openai translator

Repository: langchain-ai/langchain
Author: anmolg1997
State: closed | merged: False
Link: https://github.com/langchain-ai/langchain/pull/36937

Description (problem / solution / changelog)

Fixes #36939.

The file branch of _convert_openai_format_to_data_block hard-codes mime_type="application/pdf", while the image branch right above it uses parsed["mime_type"] from the data URI. So a CSV sent via the OpenAI file block shape comes out with mime_type="application/pdf" in the v1 content block.

One-line change to read it off the parsed data URI, same as the image branch. _parse_data_uri returns None when the mime_type is missing, so parsed["mime_type"] is always set inside this branch.

Test added with a CSV and a text/plain data URI. Existing tests still pass since they use data:application/pdf;....

Changed files

libs/core/langchain_core/messages/block_translators/openai.py (modified, +1/-1)
libs/core/tests/unit_tests/messages/block_translators/test_openai.py (modified, +43/-0)

Code Example

from langchain_core.messages import HumanMessage

# CSV attached via the OpenAI Chat Completions file block shape
# (same shape OpenAI docs prescribe for file inputs)
msg = HumanMessage(content=[
    {
        "type": "file",
        "file": {
            "filename": "sheet.csv",
            "file_data": "data:text/csv;base64,aGVsbG8=",
        },
    },
])

print(msg.content_blocks)

---

[{'type': 'file', 'id': 'lc_...', 'base64': 'aGVsbG8=',
  'mime_type': 'application/pdf',   <-- silently wrong
  'extras': {'filename': 'sheet.csv'}}]

---

# base64-style image block
if (block["type"] == "image_url") and (
    parsed := _parse_data_uri(block["image_url"]["url"])
):
    ...
    return types.create_image_block(
        base64=parsed["data"],
        mime_type=parsed["mime_type"],   # from data URI
        **all_extras,
    )

---

# base64-style file block
if (block["type"] == "file") and (
    parsed := _parse_data_uri(block["file"]["file_data"])
):
    ...
    return types.create_file_block(
        base64=parsed["data"],
        mime_type="application/pdf",     # hard-coded
        filename=filename,
        **all_extras,
    )

RAW_BUFFERClick to expand / collapse

Checked other resources

This is a bug, not a usage question. For questions, please use GitHub Discussions.
I added a very descriptive title to this issue.
I searched the LangChain documentation and API reference with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.

Example Code

from langchain_core.messages import HumanMessage

# CSV attached via the OpenAI Chat Completions file block shape
# (same shape OpenAI docs prescribe for file inputs)
msg = HumanMessage(content=[
    {
        "type": "file",
        "file": {
            "filename": "sheet.csv",
            "file_data": "data:text/csv;base64,aGVsbG8=",
        },
    },
])

print(msg.content_blocks)

Error Message and Stack Trace (if applicable)

No exception. The output is:

[{'type': 'file', 'id': 'lc_...', 'base64': 'aGVsbG8=',
  'mime_type': 'application/pdf',   <-- silently wrong
  'extras': {'filename': 'sheet.csv'}}]

Description

In libs/core/langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two symmetrical base64 branches: one for images (image_url) and one for files.

The image branch correctly reads the MIME type from the parsed data URI:

# base64-style image block
if (block["type"] == "image_url") and (
    parsed := _parse_data_uri(block["image_url"]["url"])
):
    ...
    return types.create_image_block(
        base64=parsed["data"],
        mime_type=parsed["mime_type"],   # from data URI
        **all_extras,
    )

The file branch hard-codes PDF, discarding the parsed value:

# base64-style file block
if (block["type"] == "file") and (
    parsed := _parse_data_uri(block["file"]["file_data"])
):
    ...
    return types.create_file_block(
        base64=parsed["data"],
        mime_type="application/pdf",     # hard-coded
        filename=filename,
        **all_extras,
    )

Found while tracing a separate production crash (PDF attachments through langchain-litellm to Anthropic-via-Vertex). This is a secondary correctness bug caught in passing.

System Info

langchain-core: master (reproduced on 1.2.7 as well) platform: macOS python version: 3.13

extent analysis

TL;DR

The most likely fix is to update the create_file_block function to use the parsed MIME type from the data URI instead of hard-coding it to "application/pdf".

Guidance

Review the _convert_openai_format_to_data_block function in libs/core/langchain_core/messages/block_translators/openai.py to ensure it correctly handles file blocks with different MIME types.
Update the create_file_block function to use parsed["mime_type"] instead of hard-coding "application/pdf" to fix the silent relabeling of non-PDF files.
Verify the fix by testing with different file types, such as CSV and plain text, to ensure the correct MIME type is propagated to downstream integrations.
Consider adding additional regression tests to cover other file types and ensure the fix does not introduce new issues.

Example

# base64-style file block
if (block["type"] == "file") and (
    parsed := _parse_data_uri(block["file"]["file_data"])
):
    ...
    return types.create_file_block(
        base64=parsed["data"],
        mime_type=parsed["mime_type"],  # use parsed MIME type
        filename=filename,
        **all_extras,
    )

Notes

The fix is a one-line change, and the _parse_data_uri function already guarantees a non-empty mime_type when parsed is truthy, so no extra None check is needed.

Recommendation

Apply the workaround by updating the create_file_block function to use the parsed MIME type, as this will fix the silent relabeling of non-PDF files and ensure correct propagation of MIME types to downstream integrations.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #GPU compatibility #latency issue #model loading #dependency error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - ✅(Solved) Fix core: _convert_openai_format_to_data_block hard-codes mime_type="application/pdf" on base64 file blocks [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Message and Stack Trace (if applicable)

Root Cause

Fix Action

Fixed

PR fix notes

PR #36937: core[patch]: preserve MIME type on base64 file blocks in openai translator

Description (problem / solution / changelog)

Changed files

Code Example

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

langchain - ✅(Solved) Fix core: _convert_openai_format_to_data_block hard-codes mime_type="application/pdf" on base64 file blocks [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Message and Stack Trace (if applicable)

Root Cause

Fix Action

Fixed

PR fix notes

PR #36937: core[patch]: preserve MIME type on base64 file blocks in openai translator

Description (problem / solution / changelog)

Changed files

Code Example

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING