transformers - ✅(Solved) Fix PreTrainedTokenizer.convert_ids_to_tokens(skip_special_tokens=True) rebuilds all_special_ids on every iteration of the per-id loop [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45715Fetched 2026-05-01 05:33:11
View on GitHub
Comments
2
Participants
3
Timeline
8
Reactions
0
Timeline (top)
subscribed ×3commented ×2mentioned ×2cross-referenced ×1

Root Cause

The single-id case is not improved by this change because computing set(self.all_special_ids) itself still pays one property access per call. A follow-up that caches the property output on the tokenizer instance (with appropriate invalidation on add_special_tokens / add_tokens) would address that, but it has a wider blast radius and seemed worth keeping out of this minimal fix. Happy to file a separate issue for that if it's of interest.

Fix Action

Fixed

PR fix notes

PR #45728: PythonBackend slow tokenizer convert_ids_to_tokens fix

Description (problem / solution / changelog)

What does this PR do?

Fixed the issue where PreTrainedTokenizer.convert_ids_to_tokens(skip_special_tokens=True) rebuilds all_special_ids on every iteration

Performance difference Before

512 ids skip=True : 41356 us/call
512 ids skip=False: 65 us/call

After the fix

512 ids skip=True : 130 us/call
512 ids skip=False: 67 us/call

Benchmark script

import time, random
from transformers import AutoTokenizer

MODEL = "nvidia/Kimi-K2.5-NVFP4"

tok = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
print(f"is_fast={tok.is_fast}, len(special_ids)={len(tok.all_special_ids)}")

random.seed(0)
ids = [random.randint(0, 100_000) for _ in range(512)]

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=True)
print(f"512 ids skip=True : {(time.time()-t0)/100*1e6:.0f} us/call")

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=False)
print(f"512 ids skip=False: {(time.time()-t0)/100*1e6:.0f} us/call")

Fixes #45715

  • I confirm that this is not a pure code agent PR.

Before submitting

Who can review?

@longlee0622 @Rocketknight1 @ArthurZucker @itazap

Changed files

  • src/transformers/tokenization_python.py (modified, +3/-1)

Code Example

# tokenization_python.py (current)
def convert_ids_to_tokens(self, ids, skip_special_tokens=False):
    ...
    tokens = []
    for index in ids:
        index = int(index)
        if skip_special_tokens and index in self.all_special_ids:   # ← per-iter property access
            continue
        tokens.append(...)
    return tokens

---

# tokenization_utils_base.py
@property
def all_special_ids(self) -> list[int]:
    return self.convert_tokens_to_ids(self.all_special_tokens)

---

# tokenization_utils_tokenizers.py:735
tokens = []
# self.all_special_ids is an @property which may be slow, so only compute it once before the loop
ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
for index in ids:
    index = int(index)
    if index in ids_to_skip:
        continue
    ...

---

import time, random
from transformers import AutoTokenizer

# Any local checkout of moonshotai/Kimi-K2-* with tokenization_kimi.py
MODEL = "/path/to/Kimi-K2.5-NVFP4"

tok = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
print(f"is_fast={tok.is_fast}, len(special_ids)={len(tok.all_special_ids)}")
# is_fast=False, len(special_ids)=11

random.seed(0)
ids = [random.randint(0, 100_000) for _ in range(512)]

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=True)
print(f"512 ids skip=True : {(time.time()-t0)/100*1e6:.0f} us/call")

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=False)
print(f"512 ids skip=False: {(time.time()-t0)/100*1e6:.0f} us/call")

---

512 ids skip=True : 48009 us/call
512 ids skip=False:   105 us/call

---

tokens = []
+        # self.all_special_ids is an @property which may be slow, so only compute it once before the loop
+        ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
         for index in ids:
             index = int(index)
-            if skip_special_tokens and index in self.all_special_ids:
+            if index in ids_to_skip:
                 continue
             tokens.append(
                 self._added_tokens_decoder[index].content
                 if index in self._added_tokens_decoder
                 else self._convert_id_to_token(index)
             )
         return tokens

---

512 ids skip=True :   227 us/call   (was 48009)~211× faster
512 ids skip=False:   129 us/call   (unchanged)
1   id  skip=True :    95 us/call   (unchanged — see below)
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.3.0 (also reproduces on main)
  • Python: 3.12
  • OS: Linux
  • Affected class: transformers.tokenization_python.PreTrainedTokenizer (renamed to PythonBackend on main) — the slow-tokenizer base class
  • Not affected: TokenizersBackend (the fast subclass) — it already has the fix

Who can help?

Tokenizers: @ArthurZucker

Reproduction

In src/transformers/tokenization_python.py, the slow tokenizer's convert_ids_to_tokens evaluates self.all_special_ids inside the per-id loop:

# tokenization_python.py (current)
def convert_ids_to_tokens(self, ids, skip_special_tokens=False):
    ...
    tokens = []
    for index in ids:
        index = int(index)
        if skip_special_tokens and index in self.all_special_ids:   # ← per-iter property access
            continue
        tokens.append(...)
    return tokens

all_special_ids is an @property that rebuilds the list on every access:

# tokenization_utils_base.py
@property
def all_special_ids(self) -> list[int]:
    return self.convert_tokens_to_ids(self.all_special_tokens)

So the loop is effectively O(N_tokens × cost_to_rebuild_special_ids) instead of O(N_tokens).

The fast subclass — TokenizersBackend.convert_ids_to_tokens in tokenization_utils_tokenizers.py — already hoists this out of the loop (and a comment in that file explicitly documents the problem):

# tokenization_utils_tokenizers.py:735
tokens = []
# self.all_special_ids is an @property which may be slow, so only compute it once before the loop
ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
for index in ids:
    index = int(index)
    if index in ids_to_skip:
        continue
    ...

The slow base class wasn't updated when the fast subclass got this fix.

Minimal repro on Kimi-K2.5 (TikTokenTokenizer, slow, loaded via trust_remote_code=True)

import time, random
from transformers import AutoTokenizer

# Any local checkout of moonshotai/Kimi-K2-* with tokenization_kimi.py
MODEL = "/path/to/Kimi-K2.5-NVFP4"

tok = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
print(f"is_fast={tok.is_fast}, len(special_ids)={len(tok.all_special_ids)}")
# is_fast=False, len(special_ids)=11

random.seed(0)
ids = [random.randint(0, 100_000) for _ in range(512)]

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=True)
print(f"512 ids skip=True : {(time.time()-t0)/100*1e6:.0f} us/call")

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=False)
print(f"512 ids skip=False: {(time.time()-t0)/100*1e6:.0f} us/call")

Output (transformers 5.3.0, unmodified)

512 ids skip=True : 48009 us/call
512 ids skip=False:   105 us/call

A single boolean kwarg flips the call cost by ~450×, even though the tokenizer only has 11 special tokens. The cost scales with len(ids) × cost_to_evaluate_self.all_special_ids, not with the number of special tokens.

This isn't specific to Kimi — any model that ships a slow custom tokenizer (e.g. via trust_remote_code=True) and is used for streaming detokenization (which calls convert_ids_to_tokens(..., skip_special_tokens=True) per generated token batch) will hit it. We discovered this when a TensorRT-LLM streaming benchmark on Kimi-K2.5 showed an 80% TPOT regression on the transformers 4.57 → 5.3 upgrade — DeepSeek-V3 family models (which use a fast tokenizer) on the same upgrade were unaffected.

Expected behavior

convert_ids_to_tokens(ids, skip_special_tokens=True) should be ~the same cost as skip_special_tokens=False, plus a single set-membership check per id.

Suggested fix

Mirror the fix already in TokenizersBackend.convert_ids_to_tokens — it's the same pattern:

         tokens = []
+        # self.all_special_ids is an @property which may be slow, so only compute it once before the loop
+        ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
         for index in ids:
             index = int(index)
-            if skip_special_tokens and index in self.all_special_ids:
+            if index in ids_to_skip:
                 continue
             tokens.append(
                 self._added_tokens_decoder[index].content
                 if index in self._added_tokens_decoder
                 else self._convert_id_to_token(index)
             )
         return tokens

After this fix, on the same Kimi-K2.5 repro

512 ids skip=True :   227 us/call   (was 48009)  →  ~211× faster
512 ids skip=False:   129 us/call   (unchanged)
1   id  skip=True :    95 us/call   (unchanged — see below)

The single-id case is not improved by this change because computing set(self.all_special_ids) itself still pays one property access per call. A follow-up that caches the property output on the tokenizer instance (with appropriate invalidation on add_special_tokens / add_tokens) would address that, but it has a wider blast radius and seemed worth keeping out of this minimal fix. Happy to file a separate issue for that if it's of interest.

Thanks!

extent analysis

TL;DR

To fix the performance issue in convert_ids_to_tokens, hoist the computation of self.all_special_ids outside the loop by storing it in a set before the loop starts.

Guidance

  • Identify the performance bottleneck: The convert_ids_to_tokens method is slow due to the repeated computation of self.all_special_ids inside the loop.
  • Apply the suggested fix: Mirror the fix already in TokenizersBackend.convert_ids_to_tokens by computing self.all_special_ids once before the loop and storing it in a set.
  • Verify the fix: Run the provided minimal repro code to measure the performance improvement.
  • Consider a follow-up fix: Cache the property output on the tokenizer instance to improve performance for single-id cases.

Example

# tokenization_python.py (fixed)
def convert_ids_to_tokens(self, ids, skip_special_tokens=False):
    ...
    tokens = []
    ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
    for index in ids:
        index = int(index)
        if index in ids_to_skip:
            continue
        tokens.append(...)
    return tokens

Notes

The provided fix only addresses the performance issue for batched tokenization. A separate issue may be filed to address the single-id case.

Recommendation

Apply the workaround by mirroring the fix already in TokenizersBackend.convert_ids_to_tokens, as it provides a significant performance improvement for batched tokenization.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

convert_ids_to_tokens(ids, skip_special_tokens=True) should be ~the same cost as skip_special_tokens=False, plus a single set-membership check per id.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING