langchain - 💡(How to fix) Fix core: AIMessageChunk validator escapes non-ASCII in tool_call_chunks[].args

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Buggy location: libs/core/langchain_core/messages/ai.py:509

# libs/core/langchain_core/messages/ai.py — AIMessageChunk.init_tool_calls
if not self.tool_call_chunks:
    if self.tool_calls:
        self.tool_call_chunks = [
            create_tool_call_chunk(
                name=tc["name"],
                args=json.dumps(tc["args"]),                # ← bug: defaults to ensure_ascii=True
                id=tc["id"],
                index=None,
            )
            for tc in self.tool_calls
        ]

AIMessageChunk.init_tool_calls is a @model_validator(mode="after") that back-fills tool_call_chunks from tool_calls when only the latter is supplied at construction. It uses json.dumps(...) without ensure_ascii=False, so every non-ASCII character in tool-call arguments — CJK, emoji, accented Latin — is escaped to \uXXXX in the resulting chunk's args string.

Why it matters

  1. Inconsistency on the same object: tool_calls[i]["args"] is a clean dict ({'text': '你好'}), but tool_call_chunks[i]["args"] is an escaped string ('{"text": "\\u4f60\\u597d"}'). Consumers see different content depending on which field they read.
  2. Persistence layers (DB JSON columns, log aggregators, langgraph checkpointers, langsmith traces) round-trip through tool_call_chunks, so foreign-language tool args land in storage as escape sequences and are unreadable on inspection.
  3. Cross-package convention already exists: libs/core/langchain_core/messages/utils.py:1810 passes ensure_ascii=False, and langchain-openai follows the same convention in langchain_openai/chat_models/base.py. This site is an outlier.

Proposed fix (one keyword)

args=json.dumps(tc["args"], ensure_ascii=False),

Related sites with the same anti-pattern in this repo (out of scope for this issue, can be addressed in follow-up PRs):

  • libs/core/langchain_core/messages/utils.py:1880 — Bedrock/Converse "text" block
  • libs/partners/openai/langchain_openai/chat_models/_compat.py:358, 436
  • libs/partners/anthropic/langchain_anthropic/chat_models.py:1519
  • libs/partners/anthropic/langchain_anthropic/experimental.py:130

Error Message

tool_calls[0]['args']: {'text': '你好', 'emoji': '🚀'} tool_call_chunks[0]['args']: '{"text": "\u4f60\u597d", "emoji": "\ud83d\ude80"}' Traceback (most recent call last): File "...", line N, in <module> assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \uXXXX" AssertionError: CJK was escaped to \uXXXX

Root Cause

Buggy location: libs/core/langchain_core/messages/ai.py:509

# libs/core/langchain_core/messages/ai.py — AIMessageChunk.init_tool_calls
if not self.tool_call_chunks:
    if self.tool_calls:
        self.tool_call_chunks = [
            create_tool_call_chunk(
                name=tc["name"],
                args=json.dumps(tc["args"]),                # ← bug: defaults to ensure_ascii=True
                id=tc["id"],
                index=None,
            )
            for tc in self.tool_calls
        ]

AIMessageChunk.init_tool_calls is a @model_validator(mode="after") that back-fills tool_call_chunks from tool_calls when only the latter is supplied at construction. It uses json.dumps(...) without ensure_ascii=False, so every non-ASCII character in tool-call arguments — CJK, emoji, accented Latin — is escaped to \uXXXX in the resulting chunk's args string.

Why it matters

  1. Inconsistency on the same object: tool_calls[i]["args"] is a clean dict ({'text': '你好'}), but tool_call_chunks[i]["args"] is an escaped string ('{"text": "\\u4f60\\u597d"}'). Consumers see different content depending on which field they read.
  2. Persistence layers (DB JSON columns, log aggregators, langgraph checkpointers, langsmith traces) round-trip through tool_call_chunks, so foreign-language tool args land in storage as escape sequences and are unreadable on inspection.
  3. Cross-package convention already exists: libs/core/langchain_core/messages/utils.py:1810 passes ensure_ascii=False, and langchain-openai follows the same convention in langchain_openai/chat_models/base.py. This site is an outlier.

Proposed fix (one keyword)

args=json.dumps(tc["args"], ensure_ascii=False),

Related sites with the same anti-pattern in this repo (out of scope for this issue, can be addressed in follow-up PRs):

  • libs/core/langchain_core/messages/utils.py:1880 — Bedrock/Converse "text" block
  • libs/partners/openai/langchain_openai/chat_models/_compat.py:358, 436
  • libs/partners/anthropic/langchain_anthropic/chat_models.py:1519
  • libs/partners/anthropic/langchain_anthropic/experimental.py:130

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Code Example

from langchain_core.messages import AIMessageChunk

chunk = AIMessageChunk(
    content="",
    tool_calls=[
        {
            "name": "echo",
            "args": {"text": "你好", "emoji": "🚀"},
            "id": "call_1",
            "type": "tool_call",
        }
    ],
)

print("tool_calls[0]['args']:        ", chunk.tool_calls[0]["args"])
print("tool_call_chunks[0]['args']:  ", repr(chunk.tool_call_chunks[0]["args"]))

assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \\uXXXX"

---

tool_calls[0]['args']:         {'text': '你好', 'emoji': '🚀'}
tool_call_chunks[0]['args']:   '{"text": "\\u4f60\\u597d", "emoji": "\\ud83d\\ude80"}'
Traceback (most recent call last):
  File "...", line N, in <module>
    assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \\uXXXX"
AssertionError: CJK was escaped to \uXXXX

---

# libs/core/langchain_core/messages/ai.pyAIMessageChunk.init_tool_calls
if not self.tool_call_chunks:
    if self.tool_calls:
        self.tool_call_chunks = [
            create_tool_call_chunk(
                name=tc["name"],
                args=json.dumps(tc["args"]),                # ← bug: defaults to ensure_ascii=True
                id=tc["id"],
                index=None,
            )
            for tc in self.tool_calls
        ]

---

args=json.dumps(tc["args"], ensure_ascii=False),

---

> Python Version:  3.12
> langchain-core: 1.4.0 (master @ 33875fdd...)
RAW_BUFFERClick to expand / collapse

Submission checklist

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

Same class of bug as langchain-ai/langchain-google#1789 (just-shipped fix in langchain-google-genai's _parse_response_candidate). Issue #34005 ("PydanticOutputParser formats error with ensure_ascii=True") was a different site closed earlier. The convention is already established elsewhere in this repo — e.g. libs/core/langchain_core/messages/utils.py:1810 already passes ensure_ascii=False.

Reproduction Steps / Example Code (Python)

from langchain_core.messages import AIMessageChunk

chunk = AIMessageChunk(
    content="",
    tool_calls=[
        {
            "name": "echo",
            "args": {"text": "你好", "emoji": "🚀"},
            "id": "call_1",
            "type": "tool_call",
        }
    ],
)

print("tool_calls[0]['args']:        ", chunk.tool_calls[0]["args"])
print("tool_call_chunks[0]['args']:  ", repr(chunk.tool_call_chunks[0]["args"]))

assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \\uXXXX"

Error Message and Stack Trace (if applicable)

tool_calls[0]['args']:         {'text': '你好', 'emoji': '🚀'}
tool_call_chunks[0]['args']:   '{"text": "\\u4f60\\u597d", "emoji": "\\ud83d\\ude80"}'
Traceback (most recent call last):
  File "...", line N, in <module>
    assert "你好" in chunk.tool_call_chunks[0]["args"], "CJK was escaped to \\uXXXX"
AssertionError: CJK was escaped to \uXXXX

Description

Buggy location: libs/core/langchain_core/messages/ai.py:509

# libs/core/langchain_core/messages/ai.py — AIMessageChunk.init_tool_calls
if not self.tool_call_chunks:
    if self.tool_calls:
        self.tool_call_chunks = [
            create_tool_call_chunk(
                name=tc["name"],
                args=json.dumps(tc["args"]),                # ← bug: defaults to ensure_ascii=True
                id=tc["id"],
                index=None,
            )
            for tc in self.tool_calls
        ]

AIMessageChunk.init_tool_calls is a @model_validator(mode="after") that back-fills tool_call_chunks from tool_calls when only the latter is supplied at construction. It uses json.dumps(...) without ensure_ascii=False, so every non-ASCII character in tool-call arguments — CJK, emoji, accented Latin — is escaped to \uXXXX in the resulting chunk's args string.

Why it matters

  1. Inconsistency on the same object: tool_calls[i]["args"] is a clean dict ({'text': '你好'}), but tool_call_chunks[i]["args"] is an escaped string ('{"text": "\\u4f60\\u597d"}'). Consumers see different content depending on which field they read.
  2. Persistence layers (DB JSON columns, log aggregators, langgraph checkpointers, langsmith traces) round-trip through tool_call_chunks, so foreign-language tool args land in storage as escape sequences and are unreadable on inspection.
  3. Cross-package convention already exists: libs/core/langchain_core/messages/utils.py:1810 passes ensure_ascii=False, and langchain-openai follows the same convention in langchain_openai/chat_models/base.py. This site is an outlier.

Proposed fix (one keyword)

args=json.dumps(tc["args"], ensure_ascii=False),

Related sites with the same anti-pattern in this repo (out of scope for this issue, can be addressed in follow-up PRs):

  • libs/core/langchain_core/messages/utils.py:1880 — Bedrock/Converse "text" block
  • libs/partners/openai/langchain_openai/chat_models/_compat.py:358, 436
  • libs/partners/anthropic/langchain_anthropic/chat_models.py:1519
  • libs/partners/anthropic/langchain_anthropic/experimental.py:130

System Info

> Python Version:  3.12
> langchain-core: 1.4.0 (master @ 33875fdd...)

If acceptable, I have the fix and a regression test ready locally — could you please assign this to me so I can open a PR? The fix is a one-keyword change in ai.py plus a test_messages.py regression test asserting that tool_call_chunks[0]["args"] preserves CJK / emoji from tool_calls.

Disclaimer: this report was prepared with the assistance of an AI agent (Claude Code); the bug was independently reproduced and reviewed by me before submission.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING