langchain - 💡(How to fix) Fix anthropic: base64 file blocks with non-PDF text mime types fail with media_type validation error

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
  • I am sending a standard type: "file" content block with mime_type: "text/csv" and a base64 source in a HumanMessage to ChatAnthropic.
  • I expect the request to succeed — the same content blocks work when routed through langchain-aws / Bedrock Converse, which accepts csv, doc, docx, html, md, txt, xls, xlsx, pdf as document source types.
  • Instead, the Anthropic Messages API rejects the request because ChatAnthropic forwards mime_type verbatim into the document.source.base64.media_type field, and Anthropic's API only accepts application/pdf there.

The relevant code path is the elif block["type"] == "file": branch in libs/partners/anthropic/langchain_anthropic/chat_models.py at approximately line 366:

elif "base64" in block or block.get("source_type") == "base64":
    formatted_block = {
        "type": "document",
        "source": {
            "type": "base64",
            "media_type": block.get("mime_type") or "application/pdf",
            "data": block.get("base64") or block.get("data", ""),
        },
    }

mime_type is forwarded with no check against the Anthropic-accepted set.

Anthropic's Files API documentation is explicit that non-PDF text formats (.csv, .txt, .md, .docx, .xlsx) must be converted to plain text and inlined: https://platform.claude.com/docs/en/docs/build-with-claude/files — section "Working with other file formats".

Suggested fix: in the same branch, when the mime type is text/* and not application/pdf, decode the base64 to UTF-8 and emit a text-source document block (which Anthropic does accept) instead of base64:

elif "base64" in block or block.get("source_type") == "base64":
    mime_type = block.get("mime_type") or "application/pdf"
    data = block.get("base64") or block.get("data", "")
    if mime_type.startswith("text/") and mime_type != "application/pdf":
        try:
            text = base64.b64decode(data).decode("utf-8")
            formatted_block = {
                "type": "document",
                "source": {"type": "text", "media_type": "text/plain", "data": text},
            }
        except (binascii.Error, UnicodeDecodeError):
            formatted_block = {
                "type": "document",
                "source": {"type": "base64", "media_type": mime_type, "data": data},
            }
    else:
        formatted_block = {
            "type": "document",
            "source": {"type": "base64", "media_type": mime_type, "data": data},
        }

PDF behavior is unchanged. Happy to submit a PR if this approach is acceptable.

Error Message

anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': "messages.0.content.1.document.source.base64.media_type: Input should be 'application/pdf'"}}

Root Cause

  • I am sending a standard type: "file" content block with mime_type: "text/csv" and a base64 source in a HumanMessage to ChatAnthropic.
  • I expect the request to succeed — the same content blocks work when routed through langchain-aws / Bedrock Converse, which accepts csv, doc, docx, html, md, txt, xls, xlsx, pdf as document source types.
  • Instead, the Anthropic Messages API rejects the request because ChatAnthropic forwards mime_type verbatim into the document.source.base64.media_type field, and Anthropic's API only accepts application/pdf there.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Code Example

import base64
import os

from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage

assert os.environ.get("ANTHROPIC_API_KEY")

csv = "a,b,c\n1,2,3\n"
b64 = base64.b64encode(csv.encode()).decode()

model = ChatAnthropic(model="claude-sonnet-4-5")
model.invoke(
    [
        HumanMessage(
            content=[
                {"type": "text", "text": "summarize"},
                {
                    "type": "file",
                    "data": b64,
                    "mime_type": "text/csv",
                    "metadata": {"filename": "data.csv"},
                },
            ]
        )
    ]
)

---

anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': "messages.0.content.1.document.source.base64.media_type: Input should be 'application/pdf'"}}

---

elif "base64" in block or block.get("source_type") == "base64":
    formatted_block = {
        "type": "document",
        "source": {
            "type": "base64",
            "media_type": block.get("mime_type") or "application/pdf",
            "data": block.get("base64") or block.get("data", ""),
        },
    }

---

elif "base64" in block or block.get("source_type") == "base64":
    mime_type = block.get("mime_type") or "application/pdf"
    data = block.get("base64") or block.get("data", "")
    if mime_type.startswith("text/") and mime_type != "application/pdf":
        try:
            text = base64.b64decode(data).decode("utf-8")
            formatted_block = {
                "type": "document",
                "source": {"type": "text", "media_type": "text/plain", "data": text},
            }
        except (binascii.Error, UnicodeDecodeError):
            formatted_block = {
                "type": "document",
                "source": {"type": "base64", "media_type": mime_type, "data": data},
            }
    else:
        formatted_block = {
            "type": "document",
            "source": {"type": "base64", "media_type": mime_type, "data": data},
        }

---

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 25.2.0: Tue Nov 18 21:09:45 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6030
> Python Version:  3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:25:12) [Clang 14.0.6 ]

Package Information
-------------------
> langchain_core: 1.2.17
> langchain: 1.2.15
> langchain_anthropic: 1.3.4
> langchain_aws: 1.0.0
> langchain_classic: 1.0.1
> langsmith: 0.7.6
> langgraph: 1.1.6
> langgraph-checkpoint: 4.0.0
RAW_BUFFERClick to expand / collapse

Submission checklist

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain-anthropic

Related Issues / PRs

  • #36939 — related core-level bug (_convert_openai_format_to_data_block hard-codes mime_type="application/pdf"). This issue is distinct: it lives in langchain-anthropic and concerns the outbound translation of standard file blocks to Anthropic document blocks.

Reproduction Steps / Example Code (Python)

import base64
import os

from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage

assert os.environ.get("ANTHROPIC_API_KEY")

csv = "a,b,c\n1,2,3\n"
b64 = base64.b64encode(csv.encode()).decode()

model = ChatAnthropic(model="claude-sonnet-4-5")
model.invoke(
    [
        HumanMessage(
            content=[
                {"type": "text", "text": "summarize"},
                {
                    "type": "file",
                    "data": b64,
                    "mime_type": "text/csv",
                    "metadata": {"filename": "data.csv"},
                },
            ]
        )
    ]
)

Error Message and Stack Trace (if applicable)

anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': "messages.0.content.1.document.source.base64.media_type: Input should be 'application/pdf'"}}

Description

  • I am sending a standard type: "file" content block with mime_type: "text/csv" and a base64 source in a HumanMessage to ChatAnthropic.
  • I expect the request to succeed — the same content blocks work when routed through langchain-aws / Bedrock Converse, which accepts csv, doc, docx, html, md, txt, xls, xlsx, pdf as document source types.
  • Instead, the Anthropic Messages API rejects the request because ChatAnthropic forwards mime_type verbatim into the document.source.base64.media_type field, and Anthropic's API only accepts application/pdf there.

The relevant code path is the elif block["type"] == "file": branch in libs/partners/anthropic/langchain_anthropic/chat_models.py at approximately line 366:

elif "base64" in block or block.get("source_type") == "base64":
    formatted_block = {
        "type": "document",
        "source": {
            "type": "base64",
            "media_type": block.get("mime_type") or "application/pdf",
            "data": block.get("base64") or block.get("data", ""),
        },
    }

mime_type is forwarded with no check against the Anthropic-accepted set.

Anthropic's Files API documentation is explicit that non-PDF text formats (.csv, .txt, .md, .docx, .xlsx) must be converted to plain text and inlined: https://platform.claude.com/docs/en/docs/build-with-claude/files — section "Working with other file formats".

Suggested fix: in the same branch, when the mime type is text/* and not application/pdf, decode the base64 to UTF-8 and emit a text-source document block (which Anthropic does accept) instead of base64:

elif "base64" in block or block.get("source_type") == "base64":
    mime_type = block.get("mime_type") or "application/pdf"
    data = block.get("base64") or block.get("data", "")
    if mime_type.startswith("text/") and mime_type != "application/pdf":
        try:
            text = base64.b64decode(data).decode("utf-8")
            formatted_block = {
                "type": "document",
                "source": {"type": "text", "media_type": "text/plain", "data": text},
            }
        except (binascii.Error, UnicodeDecodeError):
            formatted_block = {
                "type": "document",
                "source": {"type": "base64", "media_type": mime_type, "data": data},
            }
    else:
        formatted_block = {
            "type": "document",
            "source": {"type": "base64", "media_type": mime_type, "data": data},
        }

PDF behavior is unchanged. Happy to submit a PR if this approach is acceptable.

System Info

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 25.2.0: Tue Nov 18 21:09:45 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6030
> Python Version:  3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:25:12) [Clang 14.0.6 ]

Package Information
-------------------
> langchain_core: 1.2.17
> langchain: 1.2.15
> langchain_anthropic: 1.3.4
> langchain_aws: 1.0.0
> langchain_classic: 1.0.1
> langsmith: 0.7.6
> langgraph: 1.1.6
> langgraph-checkpoint: 4.0.0

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING