langchain - ✅(Solved) Fix core: _convert_openai_format_to_data_block hard-codes mime_type on base64 file blocks [2 pull requests, 2 comments, 1 participants]

langchain2026-04-22 07:13:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

langchain-ai/langchain#36939•Fetched 2026-04-23 07:23:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

anmolg1997

Participants

anmolg1997

Timeline (top)

labeled ×3cross-referenced ×2commented ×1issue_type_added ×1

In langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two base64 branches that look symmetrical: one for image_url, one for file.

The image branch reads the MIME type from the parsed data URI (parsed["mime_type"]). The file branch hard-codes "application/pdf".

The repro passes a CSV via the OpenAI base64 file block shape that the OpenAI docs prescribe. The resulting v1 content block has mime_type="application/pdf" instead of "text/csv", even though the data URI explicitly says text/csv. Any non-PDF file attached this way (CSV, plain text, spreadsheets, office docs) gets silently relabeled the same way.

Since _normalize_messages calls this translator on every chat model's input path, the wrong MIME type propagates to downstream integrations that consume content_blocks.

Expected: mime_type matches the data URI (text/csv in the example). Actual: mime_type is always application/pdf.

_parse_data_uri already returns None if the MIME type is missing, so the fix is to use parsed["mime_type"] like the image branch does, no extra None check needed.

Error Message

Error Message and Stack Trace (if applicable)

Root Cause

In langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two base64 branches that look symmetrical: one for image_url, one for file.

The image branch reads the MIME type from the parsed data URI (parsed["mime_type"]). The file branch hard-codes "application/pdf".

Since _normalize_messages calls this translator on every chat model's input path, the wrong MIME type propagates to downstream integrations that consume content_blocks.

Expected: mime_type matches the data URI (text/csv in the example). Actual: mime_type is always application/pdf.

_parse_data_uri already returns None if the MIME type is missing, so the fix is to use parsed["mime_type"] like the image branch does, no extra None check needed.

Fix Action

Fix / Workaround

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Other Dependencies

aiohttp: 3.13.5 dataclasses-json: 0.6.7 google-adk: 1.30.0 httpx: 0.28.1 httpx-sse: 0.4.3 jsonpatch: 1.33 numpy: 2.4.4 opentelemetry-api: 1.38.0 opentelemetry-exporter-otlp-proto-http: 1.38.0 opentelemetry-sdk: 1.38.0 orjson: 3.11.8 packaging: 26.1 pydantic: 2.12.5 pydantic-settings: 2.13.1 pytest: 9.0.3 PyYAML: 6.0.3 pyyaml: 6.0.3 requests: 2.33.1 requests-toolbelt: 1.0.0 rich: 15.0.0 SQLAlchemy: 2.0.49 sqlalchemy: 2.0.49 tenacity: 9.1.4 typing-extensions: 4.15.0 uuid-utils: 0.14.1 vcrpy: 8.1.1 websockets: 15.0.1 wrapt: 1.17.3 xxhash: 3.6.0 zstandard: 0.25.0

PR fix notes

PR #36937: core[patch]: preserve MIME type on base64 file blocks in openai translator

Repository: langchain-ai/langchain
Author: anmolg1997
State: closed | merged: False
Link: https://github.com/langchain-ai/langchain/pull/36937

Description (problem / solution / changelog)

Fixes #36939.

The file branch of _convert_openai_format_to_data_block hard-codes mime_type="application/pdf", while the image branch right above it uses parsed["mime_type"] from the data URI. So a CSV sent via the OpenAI file block shape comes out with mime_type="application/pdf" in the v1 content block.

One-line change to read it off the parsed data URI, same as the image branch. _parse_data_uri returns None when the mime_type is missing, so parsed["mime_type"] is always set inside this branch.

Test added with a CSV and a text/plain data URI. Existing tests still pass since they use data:application/pdf;....

Changed files

libs/core/langchain_core/messages/block_translators/openai.py (modified, +1/-1)
libs/core/tests/unit_tests/messages/block_translators/test_openai.py (modified, +43/-0)

PR #36940: core[patch]: use parsed mime_type for base64 file blocks in openai translator

Repository: langchain-ai/langchain
Author: anmolg1997
State: closed | merged: False
Link: https://github.com/langchain-ai/langchain/pull/36940

Description (problem / solution / changelog)

Fixes #36939.

Test added with a CSV and a text/plain data URI. Existing tests still pass since they use data:application/pdf;....

Changed files

libs/core/langchain_core/messages/block_translators/openai.py (modified, +1/-1)
libs/core/tests/unit_tests/messages/block_translators/test_openai.py (modified, +37/-0)

Code Example

from langchain_core.messages import HumanMessage

msg = HumanMessage(content=[
    {
        "type": "file",
        "file": {
            "filename": "sheet.csv",
            "file_data": "data:text/csv;base64,aGVsbG8=",
        },
    },
])

for block in msg.content_blocks:
    print(block)

---

RAW_BUFFERClick to expand / collapse

Submission checklist

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

from langchain_core.messages import HumanMessage

msg = HumanMessage(content=[
    {
        "type": "file",
        "file": {
            "filename": "sheet.csv",
            "file_data": "data:text/csv;base64,aGVsbG8=",
        },
    },
])

for block in msg.content_blocks:
    print(block)

Error Message and Stack Trace (if applicable)

Description

In langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two base64 branches that look symmetrical: one for image_url, one for file.

The image branch reads the MIME type from the parsed data URI (parsed["mime_type"]). The file branch hard-codes "application/pdf".

Since _normalize_messages calls this translator on every chat model's input path, the wrong MIME type propagates to downstream integrations that consume content_blocks.

Expected: mime_type matches the data URI (text/csv in the example). Actual: mime_type is always application/pdf.

_parse_data_uri already returns None if the MIME type is missing, so the fix is to use parsed["mime_type"] like the image branch does, no extra None check needed.

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 25.4.0: Thu Mar 19 19:33:25 PDT 2026; root:xnu-12377.101.15~1/RELEASE_ARM64_T6041 Python Version: 3.14.2 (main, Dec 5 2025, 16:49:16) [Clang 17.0.0 (clang-1700.4.4.1)]

Package Information

langchain_core: 1.3.0 langchain_community: 0.4.1 langsmith: 0.7.31 langchain_classic: 1.0.3 langchain_text_splitters: 1.1.1 langgraph_sdk: 0.3.13

Optional packages not installed

deepagents deepagents-cli

Other Dependencies

aiohttp: 3.13.5 dataclasses-json: 0.6.7 google-adk: 1.30.0 httpx: 0.28.1 httpx-sse: 0.4.3 jsonpatch: 1.33 numpy: 2.4.4 opentelemetry-api: 1.38.0 opentelemetry-exporter-otlp-proto-http: 1.38.0 opentelemetry-sdk: 1.38.0 orjson: 3.11.8 packaging: 26.1 pydantic: 2.12.5 pydantic-settings: 2.13.1 pytest: 9.0.3 PyYAML: 6.0.3 pyyaml: 6.0.3 requests: 2.33.1 requests-toolbelt: 1.0.0 rich: 15.0.0 SQLAlchemy: 2.0.49 sqlalchemy: 2.0.49 tenacity: 9.1.4 typing-extensions: 4.15.0 uuid-utils: 0.14.1 vcrpy: 8.1.1 websockets: 15.0.1 wrapt: 1.17.3 xxhash: 3.6.0 zstandard: 0.25.0

extent analysis

TL;DR

The issue can be fixed by using the parsed MIME type from the data URI in the file branch of _convert_openai_format_to_data_block instead of hard-coding "application/pdf".

Guidance

Identify the _convert_openai_format_to_data_block function in langchain_core/messages/block_translators/openai.py and locate the file branch.
Replace the hard-coded "application/pdf" with parsed["mime_type"] to use the MIME type from the data URI.
Verify that the mime_type in the resulting content_blocks matches the expected type (e.g., "text/csv" for a CSV file).
Test the change with different file types to ensure the correct MIME type is propagated.

Example

# In _convert_openai_format_to_data_block
if "file" in parsed:
    # ...
    mime_type = parsed["mime_type"]  # Use the parsed MIME type
    # ...

Notes

This fix assumes that the parsed["mime_type"] will always contain the correct MIME type. If this is not the case, additional error handling may be necessary.

Recommendation

Apply the workaround by modifying the _convert_openai_format_to_data_block function to use the parsed MIME type. This should resolve the issue with incorrect MIME types being propagated.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

langchain - ✅(Solved) Fix core: _convert_openai_format_to_data_block hard-codes mime_type on base64 file blocks [2 pull requests, 2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Message and Stack Trace (if applicable)

Root Cause

Fix Action

Fix / Workaround

Other Dependencies

PR fix notes

PR #36937: core[patch]: preserve MIME type on base64 file blocks in openai translator

Description (problem / solution / changelog)

Changed files

PR #36940: core[patch]: use parsed mime_type for base64 file blocks in openai translator

Description (problem / solution / changelog)

Changed files

Code Example

Submission checklist

Package (Required)

Related Issues / PRs

Reproduction Steps / Example Code (Python)

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING