transformers - ✅(Solved) Fix Chat template inconsistencies in tool-calling support [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45419Fetched 2026-04-15 06:19:38
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
cross-referenced ×1subscribed ×1

Error Message

:warning: = no error, but produces wrong output (e.g. "{\\"city\\": \\"Paris\\"}" instead of {"city": "Paris"}). Only Qwen3, EXAONE-4.5, and Marco-Mini normalize both formats to the same output.

Fix Action

Fixed

PR fix notes

PR #45422: Drop content=None from messages in apply_chat_template

Description (problem / solution / changelog)

In apply_chat_template, drop the content key from messages when its value is None before passing to the Jinja template.

Why this is a bug fix, not a breaking change

content=None means "there is no content", it is semantically identical to the key being absent. No caller sets content=None expecting the literal string "None" to appear in the output, or the output to be different than if the key were absent.

Yet today, several templates crash or misbehave:

After this change, out of the 20 tool-calling models tested:

  • 12 are unaffected
  • 6 are fixed (crashes or literal "None" → correct output)
  • 1 edge case (LFM2.5-VL — separate template bug)
  • 1 would regress (DeepSeek-R1 — accepts content=None but crashes on absent key, which IMO is itself a template bug to fix on the Hub repo).

Why not just leave it to template authors?

Because content=None is a real-world input (it's what the OpenAI API returns for tool-call-only messages)

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}},
        },
    }],
)

msg = response.choices[0].message
msg_dict = msg.model_dump()
print(f"\nmsg_dict = {msg_dict}")
msg_dict = {'content': None, 'refusal': None, 'role': 'assistant', 'annotations': [], 'audio': None, 'function_call': None, 'tool_calls': [{'id': 'call_Im08OArU1YY2mKi7ntPxdm22', 'function': {'arguments': '{"city":"Paris"}', 'name': 'get_weather'}, 'type': 'function'}]}

and expecting every template author to handle it correctly is a losing game — 8 out of 20 already don't. One normalization line in apply_chat_template fixes all of them at once and prevents future templates from hitting the same issue.

Part of the broader discussion in https://github.com/huggingface/transformers/issues/45419.

Changed files

  • src/transformers/processing_utils.py (modified, +8/-0)
  • src/transformers/tokenization_utils_base.py (modified, +8/-0)
  • tests/test_processing_common.py (modified, +15/-0)
  • tests/test_tokenization_common.py (modified, +27/-0)

Code Example

from jinja2 import TemplateError
from transformers import AutoTokenizer

models = {
    "DeepSeekV3": "deepseek-ai/DeepSeek-R1",
    "DeepSeekV3-0528": "deepseek-ai/DeepSeek-R1-0528",
    "EXAONE-4.5": "LGAI-EXAONE/EXAONE-4.5-33B",
    "Gemma4": "google/gemma-4-E2B-it",
    "GLM-5.1": "zai-org/GLM-5.1",
    "GLM4MOE": "zai-org/GLM-4.5",
    "GptOSS": "openai/gpt-oss-20b",
    "Holo3": "Hcompany/Holo3-35B-A3B",
    "LFM2.5-VL": "LiquidAI/LFM2.5-VL-450M",
    "Llama3.1": "meta-llama/Llama-3.1-8B-Instruct",
    "Llama3.2": "meta-llama/Llama-3.2-1B-Instruct",
    "Marco-Mini": "AIDC-AI/Marco-Mini-Instruct",
    "MiniMax-M2.5": "MiniMaxAI/MiniMax-M2.5",
    "MiniMax-M2.7": "MiniMaxAI/MiniMax-M2.7",
    "Qwen2.5": "Qwen/Qwen2.5-32B-Instruct",
    "Qwen3": "Qwen/Qwen3-8B",
    "Qwen3-Coder": "Qwen/Qwen3-Coder-Next",
    "Qwen3MOE": "Qwen/Qwen3-30B-A3B",
    "Qwen3VL": "Qwen/Qwen3-VL-2B-Instruct",
    "Qwen3.5": "Qwen/Qwen3.5-0.8B",
}

tc_dict = [{"type": "function", "function": {"name": "f", "arguments": {"a": 1}}}]
tc_str = [{"type": "function", "function": {"name": "f", "arguments": '{"a": 1}'}}]
MISSING = object()


def render(tokenizer, tool_calls, content=MISSING):
    msg = {"role": "assistant", "tool_calls": tool_calls}
    if content is not MISSING:
        msg["content"] = content
    try:
        return tokenizer.apply_chat_template([{"role": "user", "content": "hi"}, msg], tokenize=False)
    except (TemplateError, TypeError, KeyError):
        return None


def cell(result, reference=None):
    if result is None:
        return ":x:"
    if reference is not None and result != reference:
        return ":warning:"
    return ":white_check_mark:"


# ── Table 1: argument format ─────────────────────────────────────────────────
print("| Model | `dict` | `str` |")
print("|---|---|---|")
for name, model_id in models.items():
    tok = AutoTokenizer.from_pretrained(model_id)
    r_dict = render(tok, tc_dict, "")
    r_str = render(tok, tc_str, "")
    # Reference is the dict output when available, else str
    ref = r_dict if r_dict is not None else r_str
    print(f"| {name} | {cell(r_dict)} | {cell(r_str, ref)} |")

print()

# ── Table 2: content field ────────────────────────────────────────────────────
print("| Model | `content=None` | `content=\"\"` | No `content` key |")
print("|---|---|---|---|")
for name, model_id in models.items():
    tok = AutoTokenizer.from_pretrained(model_id)
    # Use each model's working arg format
    tc = tc_dict if render(tok, tc_dict, "") is not None else tc_str
    ref = render(tok, tc, "")
    r_none = render(tok, tc, None)
    r_missing = render(tok, tc, MISSING)
    print(f"| {name} | {cell(r_none, ref)} | {cell(ref)} | {cell(r_missing, ref)} |")
RAW_BUFFERClick to expand / collapse

Chat templates across model families handle tool-calling messages inconsistently. This creates fragility for any library (like TRL) that needs to construct tool-calling conversations programmatically, since there's no single "safe" way to build an assistant message with tool_calls.

I ran a systematic check across all tool-calling-capable templates. Two axes were tested:

  1. Argument format: arguments as a dict vs a JSON str
  2. Content field: content=None, content="", or omitting the content key entirely

Results

Argument format (dict vs JSON str)

Modeldictstr
DeepSeekV3:x::white_check_mark:
DeepSeekV3-0528:x::white_check_mark:
EXAONE-4.5:white_check_mark::white_check_mark:
Gemma4:white_check_mark::warning: double-escaped
GLM-5.1:white_check_mark::x:
GLM4MOE:white_check_mark::x:
GptOSS:white_check_mark::warning: double-escaped
Holo3:white_check_mark::x:
LFM2.5-VL:white_check_mark::x:
Llama3.1:white_check_mark::warning: double-escaped
Llama3.2:white_check_mark::warning: double-escaped
Marco-Mini:white_check_mark::white_check_mark:
MiniMax-M2.5:white_check_mark::x:
MiniMax-M2.7:white_check_mark::x:
Qwen2.5:white_check_mark::warning: double-escaped
Qwen3:white_check_mark::white_check_mark:
Qwen3-Coder:white_check_mark::x:
Qwen3MOE:white_check_mark::white_check_mark:
Qwen3VL:white_check_mark::white_check_mark:
Qwen3.5:white_check_mark::x:

:warning: = no error, but produces wrong output (e.g. "{\\"city\\": \\"Paris\\"}" instead of {"city": "Paris"}). Only Qwen3, EXAONE-4.5, and Marco-Mini normalize both formats to the same output.

Content field alongside tool_calls

Modelcontent=Nonecontent=""No content key
DeepSeekV3:white_check_mark::white_check_mark::x:
DeepSeekV3-0528:x::white_check_mark::x:
EXAONE-4.5:white_check_mark::white_check_mark::white_check_mark:
Gemma4:white_check_mark::white_check_mark::white_check_mark:
GLM-5.1:warning: literal "None":white_check_mark::white_check_mark:
GLM4MOE:warning: literal "None":white_check_mark::white_check_mark:
GptOSS:x::white_check_mark::white_check_mark:
Holo3:white_check_mark::white_check_mark::white_check_mark:
LFM2.5-VL:x::white_check_mark::warning: missing trailing <|im_end|>\n
Llama3.1:white_check_mark::white_check_mark::white_check_mark:
Llama3.2:white_check_mark::white_check_mark::white_check_mark:
Marco-Mini:white_check_mark::white_check_mark::white_check_mark:
MiniMax-M2.5:warning: literal "None":white_check_mark::white_check_mark:
MiniMax-M2.7:warning: literal "None":white_check_mark::white_check_mark:
Qwen2.5:white_check_mark::white_check_mark::white_check_mark:
Qwen3:white_check_mark::white_check_mark::white_check_mark:
Qwen3-Coder:white_check_mark::white_check_mark::white_check_mark:
Qwen3MOE:white_check_mark::white_check_mark::white_check_mark:
Qwen3VL:x::white_check_mark::white_check_mark:
Qwen3.5:white_check_mark::white_check_mark::white_check_mark:

content="" is the only universally safe option today.

Reproduction

<details> <summary>Script to reproduce both tables</summary>
from jinja2 import TemplateError
from transformers import AutoTokenizer

models = {
    "DeepSeekV3": "deepseek-ai/DeepSeek-R1",
    "DeepSeekV3-0528": "deepseek-ai/DeepSeek-R1-0528",
    "EXAONE-4.5": "LGAI-EXAONE/EXAONE-4.5-33B",
    "Gemma4": "google/gemma-4-E2B-it",
    "GLM-5.1": "zai-org/GLM-5.1",
    "GLM4MOE": "zai-org/GLM-4.5",
    "GptOSS": "openai/gpt-oss-20b",
    "Holo3": "Hcompany/Holo3-35B-A3B",
    "LFM2.5-VL": "LiquidAI/LFM2.5-VL-450M",
    "Llama3.1": "meta-llama/Llama-3.1-8B-Instruct",
    "Llama3.2": "meta-llama/Llama-3.2-1B-Instruct",
    "Marco-Mini": "AIDC-AI/Marco-Mini-Instruct",
    "MiniMax-M2.5": "MiniMaxAI/MiniMax-M2.5",
    "MiniMax-M2.7": "MiniMaxAI/MiniMax-M2.7",
    "Qwen2.5": "Qwen/Qwen2.5-32B-Instruct",
    "Qwen3": "Qwen/Qwen3-8B",
    "Qwen3-Coder": "Qwen/Qwen3-Coder-Next",
    "Qwen3MOE": "Qwen/Qwen3-30B-A3B",
    "Qwen3VL": "Qwen/Qwen3-VL-2B-Instruct",
    "Qwen3.5": "Qwen/Qwen3.5-0.8B",
}

tc_dict = [{"type": "function", "function": {"name": "f", "arguments": {"a": 1}}}]
tc_str = [{"type": "function", "function": {"name": "f", "arguments": '{"a": 1}'}}]
MISSING = object()


def render(tokenizer, tool_calls, content=MISSING):
    msg = {"role": "assistant", "tool_calls": tool_calls}
    if content is not MISSING:
        msg["content"] = content
    try:
        return tokenizer.apply_chat_template([{"role": "user", "content": "hi"}, msg], tokenize=False)
    except (TemplateError, TypeError, KeyError):
        return None


def cell(result, reference=None):
    if result is None:
        return ":x:"
    if reference is not None and result != reference:
        return ":warning:"
    return ":white_check_mark:"


# ── Table 1: argument format ─────────────────────────────────────────────────
print("| Model | `dict` | `str` |")
print("|---|---|---|")
for name, model_id in models.items():
    tok = AutoTokenizer.from_pretrained(model_id)
    r_dict = render(tok, tc_dict, "")
    r_str = render(tok, tc_str, "")
    # Reference is the dict output when available, else str
    ref = r_dict if r_dict is not None else r_str
    print(f"| {name} | {cell(r_dict)} | {cell(r_str, ref)} |")

print()

# ── Table 2: content field ────────────────────────────────────────────────────
print("| Model | `content=None` | `content=\"\"` | No `content` key |")
print("|---|---|---|---|")
for name, model_id in models.items():
    tok = AutoTokenizer.from_pretrained(model_id)
    # Use each model's working arg format
    tc = tc_dict if render(tok, tc_dict, "") is not None else tc_str
    ref = render(tok, tc, "")
    r_none = render(tok, tc, None)
    r_missing = render(tok, tc, MISSING)
    print(f"| {name} | {cell(r_none, ref)} | {cell(ref)} | {cell(r_missing, ref)} |")
</details>

Discussion

arguments format: template bug

The transformers docs already specify that arguments should be a dict. So templates that reject dict args (DeepSeekV3) can be considered "broken".

[!TIP] Recommendation 1: fix templates that reject dict args via PRs on the Hub model repos (e.g. deepseek-ai/DeepSeek-R1).

Templates that silently double-escape str args (Llama, Gemma4, Qwen2.5, GptOSS) are a softer issue: the caller is passing the wrong type, but the template produces wrong output instead of failing loudly.

[!TIP] Recommendation 2: apply_chat_template could emit a warning when arguments is a string, to catch silent double-escaping early.

content field: no spec exists

There is no documented contract for content in assistant messages with tool_calls. All three variants are legitimate (None from the OpenAI API, "" from downstream normalization, absent key from the transformers docs examples), yet 7/20 templates break or misbehave on at least one.

[!TIP] Recommendation 3: define the contract — in assistant messages with tool_calls, the content key MAY be absent. Templates must handle this case.

[!TIP] Recommendation 4: apply_chat_template should drop content from assistant messages when it is None, so that templates only need to handle two cases (key absent vs string value which could technically mean different things) instead of three. #45422

[!TIP] Recommendation 5: build a Space that analyzes any model's chat template and checks compliance with these contracts (dict args support, absent content key handling, etc.). Since templates live in Hub repos and not in transformers, a central linting tool would help model authors catch issues before users hit them.

What do you think?

extent analysis

TL;DR

To address the inconsistencies in handling tool-calling messages across different models, it is recommended to standardize the arguments format to a dict and define a contract for the content field, allowing it to be absent in assistant messages with tool_calls.

Guidance

  1. Fix templates that reject dict args: Update templates like DeepSeekV3 to accept dict arguments as per the transformers documentation.
  2. Implement warnings for incorrect arguments type: Modify apply_chat_template to emit a warning when arguments is a string to catch silent double-escaping early.
  3. Define the contract for the content field: Specify that the content key may be absent in assistant messages with tool_calls and ensure templates handle this case.
  4. Simplify content handling in apply_chat_template: Drop the content key from assistant messages when it is None to reduce the number of cases templates need to handle.
  5. Develop a linting tool for chat template compliance: Create a Space to analyze models' chat templates for compliance with established contracts to help model authors identify and fix issues before they affect users.

Example

No specific code example is provided as the issue requires updates to various templates and the apply_chat_template function, which would involve modifying existing codebases.

Notes

The provided guidance assumes that the transformers documentation and the behavior of apply_chat_template are authoritative. Implementing these recommendations may require coordination with model authors and maintainers of the transformers library.

Recommendation

Apply the workaround by standardizing the arguments format and defining a contract for the content field, as this approach addresses the inconsistencies across models without requiring a specific version upgrade.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING