transformers - ✅(Solved) Fix LlamaTokenizer in v5 overrides tokenizer.json's ByteLevel pre-tokenizer with Metaspace, silently breaks DeepSeek V3/R1 family [1 pull requests, 1 comments, 2 participants]

transformers2026-04-17 08:50:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45488•Fetched 2026-04-18 05:51:50

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dc3671

Participants

dc3671

Rocketknight1

Timeline (top)

mentioned ×3subscribed ×3commented ×1cross-referenced ×1

Root Cause

DeepSeek's tokenizer_config.json sets "tokenizer_class": "LlamaTokenizerFast", but tokenizer.json is ByteLevel BPE (GPT-2 style, not SentencePiece). In v5, LlamaTokenizer.__init__ unconditionally installs a Metaspace pre-tokenizer, overwriting whatever tokenizer.json declared. Since the vocab has no ▁-prefixed tokens, the metaspace marker drops and spaces vanish.

Loading the same file via tokenizers.Tokenizer.from_file(...) or PreTrainedTokenizerFast.from_pretrained(...) works — the bug is specifically in LlamaTokenizer's __init__.

Fix Action

Workaround

from transformers import PreTrainedTokenizerFast
tok = PreTrainedTokenizerFast.from_pretrained("deepseek-ai/DeepSeek-V3")

PR fix notes

PR #4: [https://nvbugs/6074784][fix] reload ByteLevel tokenizer via PreTrainedTokenizerFast for LlamaTokenizer

Repository: longlee0622/TensorRT-LLM
Author: dc3671
State: open | merged: False
Link: https://github.com/longlee0622/TensorRT-LLM/pull/4

Description (problem / solution / changelog)

Description

Fixes the DeepSeek-V3-Lite accuracy failures (25%→64% on GSM8K) in the Transformers 5.3 upgrade effort (DeepSeekV3Lite_accuracy, 35 failures).

Root cause (tokenizer, not model config): DeepSeek-V3-Lite's tokenizer_config.json declares tokenizer_class: LlamaTokenizer, but its tokenizer.json is actually a ByteLevel BPE (GPT-style, not SentencePiece). In Transformers 5.x, LlamaTokenizer.__init__ unconditionally installs a Metaspace pre-tokenizer, silently replacing the ByteLevel one from tokenizer.json. Spaces then get stripped from every prompt ("hello world" → "helloworld"), the model sees garbage, and GSM8K collapses from ~63.7% to ~26%.

Direct verification:

AutoTokenizer.from_pretrained(..., trust_remote_code=True).decode(encode("hello world")) → "helloworld" (broken)
PreTrainedTokenizerFast.from_pretrained(...).decode(encode("hello world")) → "hello world" (correct)

This is distinct from the earlier DeepSeek V3 rope_scaling fixes (9e38e95a2 / a647dbe35 / f207585ff). The accuracy number being similar (25% vs 64%) was a red herring — the RoPE path is already correct; the tokenizer was the actual culprit.

The entire DeepSeek V3 / R1 family ships the same pattern and is affected. Filed upstream as huggingface/transformers#45488.

Fix: After AutoTokenizer.from_pretrained, detect the mismatch — loaded tokenizer has Metaspace pre-tokenizer but tokenizer.json declares ByteLevel — and reload via PreTrainedTokenizerFast, which respects tokenizer.json verbatim. Generic and targeted: triggers only on this bug scenario; Llama 3.x and genuine SentencePiece tokenizers are unaffected.

Test Coverage

Verified locally on GB300 (SM 10.3):

Test	Accuracy (before)	Accuracy (after)	Threshold	Status
`TestDeepSeekV3Lite::test_nvfp4[moe_backend=CUTLASS-mtp_nextn=0-fp8kv=False-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False]`	26.1%	>60%	60.51%	✅ PASS
`TestDeepSeekV3Lite::test_bfloat16[mtp_nextn=0-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False-enable_chunked_prefill=False-v2_kv_cache=True]`	(broken)	65.087%	61.54%	✅ PASS

The fix should clear all 35 DeepSeekV3Lite_accuracy failures listed in the nvbug since they all load the same DeepSeek tokenizer through the same code path. Sanity-checked Llama-3.1-8B-Instruct still loads unchanged (TokenizersBackend, Sequence pre-tokenizer, round-trip OK) — fix has no effect on correctly-configured tokenizers.

PR Checklist

PR description clearly explains what and why.
PR Follows TRT-LLM CODING GUIDELINES to the best of my knowledge.
Test cases: relied on existing accuracy tests; no new test added since the fix is a narrow runtime workaround for an upstream regression (tracked at huggingface/transformers#45488).
No new dependencies.

Changed files

tensorrt_llm/tokenizer/tokenizer.py (modified, +63/-0)

Code Example

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3")
print(repr(tok.decode(tok.encode("hello world", add_special_tokens=False))))
# 'helloworld'  — expected 'hello world'

---

from transformers import PreTrainedTokenizerFast
tok = PreTrainedTokenizerFast.from_pretrained("deepseek-ai/DeepSeek-V3")

RAW_BUFFERClick to expand / collapse

System info

transformers: 5.3.0
tokenizers: 0.22.2
Python: 3.12 / Linux

Who can help?

@ArthurZucker

Reproduction

Tokenizer-only, ~7 MB download:

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3")
print(repr(tok.decode(tok.encode("hello world", add_special_tokens=False))))
# 'helloworld'  — expected 'hello world'

Expected vs actual

	v4.x / raw `tokenizers.Tokenizer.from_file`	v5.x `AutoTokenizer`
round-trip of `"hello world"`	`"hello world"`	`"helloworld"`
`backend_tokenizer.pre_tokenizer`	`Sequence([..., ByteLevel])` (from `tokenizer.json`)	`Metaspace(replacement="▁", split=False)`

Root cause

Loading the same file via tokenizers.Tokenizer.from_file(...) or PreTrainedTokenizerFast.from_pretrained(...) works — the bug is specifically in LlamaTokenizer's __init__.

Affected models

All models shipping tokenizer_class: Llama*Tokenizer + a ByteLevel tokenizer.json. Confirmed broken on all 5 DeepSeek V3/R1 upstream repos: DeepSeek-V3, DeepSeek-V3.1, DeepSeek-V3.2-Exp, DeepSeek-R1, DeepSeek-R1-0528.

Note: LlamaTokenizer is LlamaTokenizerFast in v5, so either class name in tokenizer_config.json triggers the bug.

Downstream impact

Catastrophic silent accuracy loss on any HF-based inference stack. For example, GSM8K on DeepSeek-V3-Lite drops ~63.7% → ~26.1% purely from this tokenizer regression.

Suggested fix

In LlamaTokenizer.__init__, only install the default Metaspace pre-tokenizer when tokenizer.json is absent (the "train from scratch" path). When loading an existing tokenizer.json, trust its pre_tokenizer.

Workaround

from transformers import PreTrainedTokenizerFast
tok = PreTrainedTokenizerFast.from_pretrained("deepseek-ai/DeepSeek-V3")

extent analysis

TL;DR

The most likely fix is to modify the LlamaTokenizer.__init__ method to only install the default Metaspace pre-tokenizer when tokenizer.json is absent.

Guidance

Verify the issue by checking if the model's tokenizer_config.json sets "tokenizer_class": "LlamaTokenizerFast" and if the tokenizer.json is ByteLevel BPE.
To mitigate the issue, use PreTrainedTokenizerFast.from_pretrained instead of AutoTokenizer.from_pretrained as a temporary workaround.
Check if the model is one of the affected models listed, which include DeepSeek-V3, DeepSeek-V3.1, DeepSeek-V3.2-Exp, DeepSeek-R1, and DeepSeek-R1-0528.
Be aware that this bug can cause catastrophic silent accuracy loss on any HF-based inference stack.

Example

from transformers import PreTrainedTokenizerFast
tok = PreTrainedTokenizerFast.from_pretrained("deepseek-ai/DeepSeek-V3")
print(repr(tok.decode(tok.encode("hello world", add_special_tokens=False))))
# Expected output: 'hello world'

Notes

This issue is specific to the LlamaTokenizer class in version 5 of the transformers library and only affects models with a ByteLevel tokenizer.json. The suggested fix requires modifying the LlamaTokenizer.__init__ method.

Recommendation

Apply the workaround by using PreTrainedTokenizerFast.from_pretrained instead of AutoTokenizer.from_pretrained until the LlamaTokenizer.__init__ method is modified to correctly handle existing tokenizer.json files.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#output truncation #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix LlamaTokenizer in v5 overrides tokenizer.json's ByteLevel pre-tokenizer with Metaspace, silently breaks DeepSeek V3/R1 family [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

PR fix notes

PR #4: [https://nvbugs/6074784][fix] reload ByteLevel tokenizer via PreTrainedTokenizerFast for LlamaTokenizer

Description (problem / solution / changelog)

Description

Test Coverage

PR Checklist

Changed files

Code Example

System info

Who can help?

Reproduction

Expected vs actual

Root cause

Affected models

Downstream impact

Suggested fix

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING