langchain - 💡(How to fix) Fix core: AIMessageChunk validator escapes non-ASCII in tool_call_chunks[].args

langchain2026-05-25 20:14:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Buggy location: libs/core/langchain_core/messages/ai.py:509

# libs/core/langchain_core/messages/ai.py — AIMessageChunk.init_tool_calls
if not self.tool_call_chunks:
    if self.tool_calls:
        self.tool_call_chunks = [
            create_tool_call_chunk(
                name=tc["name"],
                args=json.dumps(tc["args"]),                # ← bug: defaults to ensure_ascii=True
                id=tc["id"],
                index=None,
            )
            for tc in self.tool_calls
        ]

AIMessageChunk.init_tool_calls is a @model_validator(mode="after") that back-fills tool_call_chunks from tool_calls when only the latter is supplied at construction. It uses json.dumps(...) without ensure_ascii=False, so every non-ASCII character in tool-call arguments — CJK, emoji, accented Latin — is escaped to \uXXXX in the resulting chunk's args string.

Why it matters

Inconsistency on the same object: tool_calls[i]["args"] is a clean dict ({'text': '你好'}), but tool_call_chunks[i]["args"] is an escaped string ('{"text": "\\u4f60\\u597d"}'). Consumers see different content depending on which field they read.
Persistence layers (DB JSON columns, log aggregators, langgraph checkpointers, langsmith traces) round-trip through tool_call_chunks, so foreign-language tool args land in storage as escape sequences and are unreadable on inspection.
Cross-package convention already exists: libs/core/langchain_core/messages/utils.py:1810 passes ensure_ascii=False, and langchain-openai follows the same convention in langchain_openai/chat_models/base.py. This site is an outlier.

Proposed fix (one keyword)

args=json.dumps(tc["args"], ensure_ascii=False),

Related sites with the same anti-pattern in this repo (out of scope for this issue, can be addressed in follow-up PRs):

libs/core/langchain_core/messages/utils.py:1880 — Bedrock/Converse "text" block
libs/partners/openai/langchain_openai/chat_models/_compat.py:358, 436
libs/partners/anthropic/langchain_anthropic/chat_models.py:1519
libs/partners/anthropic/langchain_anthropic/experimental.py:130

Error Message

tool_calls[0]['args']: {'text': '你好', 'emoji': '🚀'} tool_call_chunks[0]['args']: '{"text": "\u4f60\u597d", "emoji": "\ud83d\ude80"}' Traceback (most recent call last): File "...", line N, in <module> assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \uXXXX" AssertionError: CJK was escaped to \uXXXX

Root Cause

Buggy location: libs/core/langchain_core/messages/ai.py:509

# libs/core/langchain_core/messages/ai.py — AIMessageChunk.init_tool_calls
if not self.tool_call_chunks:
    if self.tool_calls:
        self.tool_call_chunks = [
            create_tool_call_chunk(
                name=tc["name"],
                args=json.dumps(tc["args"]),                # ← bug: defaults to ensure_ascii=True
                id=tc["id"],
                index=None,
            )
            for tc in self.tool_calls
        ]

Why it matters

Inconsistency on the same object: tool_calls[i]["args"] is a clean dict ({'text': '你好'}), but tool_call_chunks[i]["args"] is an escaped string ('{"text": "\\u4f60\\u597d"}'). Consumers see different content depending on which field they read.
Persistence layers (DB JSON columns, log aggregators, langgraph checkpointers, langsmith traces) round-trip through tool_call_chunks, so foreign-language tool args land in storage as escape sequences and are unreadable on inspection.
Cross-package convention already exists: libs/core/langchain_core/messages/utils.py:1810 passes ensure_ascii=False, and langchain-openai follows the same convention in langchain_openai/chat_models/base.py. This site is an outlier.

Proposed fix (one keyword)

args=json.dumps(tc["args"], ensure_ascii=False),

Related sites with the same anti-pattern in this repo (out of scope for this issue, can be addressed in follow-up PRs):

libs/core/langchain_core/messages/utils.py:1880 — Bedrock/Converse "text" block
libs/partners/openai/langchain_openai/chat_models/_compat.py:358, 436
libs/partners/anthropic/langchain_anthropic/chat_models.py:1519
libs/partners/anthropic/langchain_anthropic/experimental.py:130

Fix Action

Fix / Workaround

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Code Example

from langchain_core.messages import AIMessageChunk

chunk = AIMessageChunk(
    content="",
    tool_calls=[
        {
            "name": "echo",
            "args": {"text": "你好", "emoji": "🚀"},
            "id": "call_1",
            "type": "tool_call",
        }
    ],
)

print("tool_calls[0]['args']:        ", chunk.tool_calls[0]["args"])
print("tool_call_chunks[0]['args']:  ", repr(chunk.tool_call_chunks[0]["args"]))

assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \\uXXXX"

---

tool_calls[0]['args']:         {'text': '你好', 'emoji': '🚀'}
tool_call_chunks[0]['args']:   '{"text": "\\u4f60\\u597d", "emoji": "\\ud83d\\ude80"}'
Traceback (most recent call last):
  File "...", line N, in <module>
    assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \\uXXXX"
AssertionError: CJK was escaped to \uXXXX

---

# libs/core/langchain_core/messages/ai.py — AIMessageChunk.init_tool_calls
if not self.tool_call_chunks:
    if self.tool_calls:
        self.tool_call_chunks = [
            create_tool_call_chunk(
                name=tc["name"],
                args=json.dumps(tc["args"]),                # ← bug: defaults to ensure_ascii=True
                id=tc["id"],
                index=None,
            )
            for tc in self.tool_calls
        ]

---

args=json.dumps(tc["args"], ensure_ascii=False),

---

> Python Version:  3.12
> langchain-core: 1.4.0 (master @ 33875fdd...)

RAW_BUFFERClick to expand / collapse

Submission checklist

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

Related Issues / PRs

Same class of bug as langchain-ai/langchain-google#1789 (just-shipped fix in langchain-google-genai's _parse_response_candidate). Issue #34005 ("PydanticOutputParser formats error with ensure_ascii=True") was a different site closed earlier. The convention is already established elsewhere in this repo — e.g. libs/core/langchain_core/messages/utils.py:1810 already passes ensure_ascii=False.

Reproduction Steps / Example Code (Python)

from langchain_core.messages import AIMessageChunk

chunk = AIMessageChunk(
    content="",
    tool_calls=[
        {
            "name": "echo",
            "args": {"text": "你好", "emoji": "🚀"},
            "id": "call_1",
            "type": "tool_call",
        }
    ],
)

print("tool_calls[0]['args']:        ", chunk.tool_calls[0]["args"])
print("tool_call_chunks[0]['args']:  ", repr(chunk.tool_call_chunks[0]["args"]))

assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \\uXXXX"

Error Message and Stack Trace (if applicable)

tool_calls[0]['args']:         {'text': '你好', 'emoji': '🚀'}
tool_call_chunks[0]['args']:   '{"text": "\\u4f60\\u597d", "emoji": "\\ud83d\\ude80"}'
Traceback (most recent call last):
  File "...", line N, in <module>
    assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \\uXXXX"
AssertionError: CJK was escaped to \uXXXX

Description

Buggy location: libs/core/langchain_core/messages/ai.py:509

# libs/core/langchain_core/messages/ai.py — AIMessageChunk.init_tool_calls
if not self.tool_call_chunks:
    if self.tool_calls:
        self.tool_call_chunks = [
            create_tool_call_chunk(
                name=tc["name"],
                args=json.dumps(tc["args"]),                # ← bug: defaults to ensure_ascii=True
                id=tc["id"],
                index=None,
            )
            for tc in self.tool_calls
        ]

Why it matters

Inconsistency on the same object: tool_calls[i]["args"] is a clean dict ({'text': '你好'}), but tool_call_chunks[i]["args"] is an escaped string ('{"text": "\\u4f60\\u597d"}'). Consumers see different content depending on which field they read.
Persistence layers (DB JSON columns, log aggregators, langgraph checkpointers, langsmith traces) round-trip through tool_call_chunks, so foreign-language tool args land in storage as escape sequences and are unreadable on inspection.
Cross-package convention already exists: libs/core/langchain_core/messages/utils.py:1810 passes ensure_ascii=False, and langchain-openai follows the same convention in langchain_openai/chat_models/base.py. This site is an outlier.

Proposed fix (one keyword)

args=json.dumps(tc["args"], ensure_ascii=False),

Related sites with the same anti-pattern in this repo (out of scope for this issue, can be addressed in follow-up PRs):

libs/core/langchain_core/messages/utils.py:1880 — Bedrock/Converse "text" block
libs/partners/openai/langchain_openai/chat_models/_compat.py:358, 436
libs/partners/anthropic/langchain_anthropic/chat_models.py:1519
libs/partners/anthropic/langchain_anthropic/experimental.py:130

System Info

> Python Version:  3.12
> langchain-core: 1.4.0 (master @ 33875fdd...)

If acceptable, I have the fix and a regression test ready locally — could you please assign this to me so I can open a PR? The fix is a one-keyword change in ai.py plus a test_messages.py regression test asserting that tool_call_chunks[0]["args"] preserves CJK / emoji from tool_calls.

Disclaimer: this report was prepared with the assistance of an AI agent (Claude Code); the bug was independently reproduced and reviewed by me before submission.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - 💡(How to fix) Fix core: AIMessageChunk validator escapes non-ASCII in tool_call_chunks[].args

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Submission checklist

Package (Required)

Related Issues / PRs

Reproduction Steps / Example Code (Python)

Error Message and Stack Trace (if applicable)

Description

System Info

Still need to ship something?

TRENDING