transformers - ✅(Solved) Fix PreTrainedTokenizer.convert_ids_to_tokens(skip_special_tokens=True) rebuilds all_special_ids on every iteration of the per-id loop [1 pull requests, 2 comments, 3 participants]

Q: Expected behavior

`convert_ids_to_tokens(ids, skip_special_tokens=True)` should be ~the same cost as `skip_special_tokens=False`, plus a single set-membership check per id.

transformers2026-04-30 09:08:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45715•Fetched 2026-05-01 05:33:11

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

subscribed ×3commented ×2mentioned ×2cross-referenced ×1

Root Cause

The single-id case is not improved by this change because computing set(self.all_special_ids) itself still pays one property access per call. A follow-up that caches the property output on the tokenizer instance (with appropriate invalidation on add_special_tokens / add_tokens) would address that, but it has a wider blast radius and seemed worth keeping out of this minimal fix. Happy to file a separate issue for that if it's of interest.

Fix Action

Fixed

Fixed by PR: PythonBackend slow tokenizer convert_ids_to_tokens fix (https://github.com/huggingface/transformers/pull/45728)

PR fix notes

PR #45728: PythonBackend slow tokenizer convert_ids_to_tokens fix

Repository: huggingface/transformers
Author: i3hz
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45728

Description (problem / solution / changelog)

What does this PR do?

Fixed the issue where PreTrainedTokenizer.convert_ids_to_tokens(skip_special_tokens=True) rebuilds all_special_ids on every iteration

Performance difference Before

512 ids skip=True : 41356 us/call
512 ids skip=False: 65 us/call

After the fix

512 ids skip=True : 130 us/call
512 ids skip=False: 67 us/call

Benchmark script

import time, random
from transformers import AutoTokenizer

MODEL = "nvidia/Kimi-K2.5-NVFP4"

tok = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
print(f"is_fast={tok.is_fast}, len(special_ids)={len(tok.all_special_ids)}")

random.seed(0)
ids = [random.randint(0, 100_000) for _ in range(512)]

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=True)
print(f"512 ids skip=True : {(time.time()-t0)/100*1e6:.0f} us/call")

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=False)
print(f"512 ids skip=False: {(time.time()-t0)/100*1e6:.0f} us/call")

Fixes #45715

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. https://github.com/huggingface/transformers/issues/45715
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@longlee0622 @Rocketknight1 @ArthurZucker @itazap

Changed files

src/transformers/tokenization_python.py (modified, +3/-1)

Code Example

# tokenization_python.py (current)
def convert_ids_to_tokens(self, ids, skip_special_tokens=False):
    ...
    tokens = []
    for index in ids:
        index = int(index)
        if skip_special_tokens and index in self.all_special_ids:   # ← per-iter property access
            continue
        tokens.append(...)
    return tokens

---

# tokenization_utils_base.py
@property
def all_special_ids(self) -> list[int]:
    return self.convert_tokens_to_ids(self.all_special_tokens)

---

# tokenization_utils_tokenizers.py:735
tokens = []
# self.all_special_ids is an @property which may be slow, so only compute it once before the loop
ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
for index in ids:
    index = int(index)
    if index in ids_to_skip:
        continue
    ...

---

import time, random
from transformers import AutoTokenizer

# Any local checkout of moonshotai/Kimi-K2-* with tokenization_kimi.py
MODEL = "/path/to/Kimi-K2.5-NVFP4"

tok = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
print(f"is_fast={tok.is_fast}, len(special_ids)={len(tok.all_special_ids)}")
# is_fast=False, len(special_ids)=11

random.seed(0)
ids = [random.randint(0, 100_000) for _ in range(512)]

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=True)
print(f"512 ids skip=True : {(time.time()-t0)/100*1e6:.0f} us/call")

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=False)
print(f"512 ids skip=False: {(time.time()-t0)/100*1e6:.0f} us/call")

---

512 ids skip=True : 48009 us/call
512 ids skip=False:   105 us/call

---

tokens = []
+        # self.all_special_ids is an @property which may be slow, so only compute it once before the loop
+        ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
         for index in ids:
             index = int(index)
-            if skip_special_tokens and index in self.all_special_ids:
+            if index in ids_to_skip:
                 continue
             tokens.append(
                 self._added_tokens_decoder[index].content
                 if index in self._added_tokens_decoder
                 else self._convert_id_to_token(index)
             )
         return tokens

---

512 ids skip=True :   227 us/call   (was 48009)  →  ~211× faster
512 ids skip=False:   129 us/call   (unchanged)
1   id  skip=True :    95 us/call   (unchanged — see below)

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.3.0 (also reproduces on main)
Python: 3.12
OS: Linux
Affected class: transformers.tokenization_python.PreTrainedTokenizer (renamed to PythonBackend on main) — the slow-tokenizer base class
Not affected: TokenizersBackend (the fast subclass) — it already has the fix

Who can help?

Tokenizers: @ArthurZucker

Reproduction

In src/transformers/tokenization_python.py, the slow tokenizer's convert_ids_to_tokens evaluates self.all_special_ids inside the per-id loop:

# tokenization_python.py (current)
def convert_ids_to_tokens(self, ids, skip_special_tokens=False):
    ...
    tokens = []
    for index in ids:
        index = int(index)
        if skip_special_tokens and index in self.all_special_ids:   # ← per-iter property access
            continue
        tokens.append(...)
    return tokens

all_special_ids is an @property that rebuilds the list on every access:

# tokenization_utils_base.py
@property
def all_special_ids(self) -> list[int]:
    return self.convert_tokens_to_ids(self.all_special_tokens)

So the loop is effectively O(N_tokens × cost_to_rebuild_special_ids) instead of O(N_tokens).

The fast subclass — TokenizersBackend.convert_ids_to_tokens in tokenization_utils_tokenizers.py — already hoists this out of the loop (and a comment in that file explicitly documents the problem):

# tokenization_utils_tokenizers.py:735
tokens = []
# self.all_special_ids is an @property which may be slow, so only compute it once before the loop
ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
for index in ids:
    index = int(index)
    if index in ids_to_skip:
        continue
    ...

The slow base class wasn't updated when the fast subclass got this fix.

Minimal repro on Kimi-K2.5 (TikTokenTokenizer, slow, loaded via `trust_remote_code=True`)

import time, random
from transformers import AutoTokenizer

# Any local checkout of moonshotai/Kimi-K2-* with tokenization_kimi.py
MODEL = "/path/to/Kimi-K2.5-NVFP4"

tok = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
print(f"is_fast={tok.is_fast}, len(special_ids)={len(tok.all_special_ids)}")
# is_fast=False, len(special_ids)=11

random.seed(0)
ids = [random.randint(0, 100_000) for _ in range(512)]

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=True)
print(f"512 ids skip=True : {(time.time()-t0)/100*1e6:.0f} us/call")

t0 = time.time()
for _ in range(100):
    tok.convert_ids_to_tokens(ids, skip_special_tokens=False)
print(f"512 ids skip=False: {(time.time()-t0)/100*1e6:.0f} us/call")

Output (transformers 5.3.0, unmodified)

512 ids skip=True : 48009 us/call
512 ids skip=False:   105 us/call

A single boolean kwarg flips the call cost by ~450×, even though the tokenizer only has 11 special tokens. The cost scales with len(ids) × cost_to_evaluate_self.all_special_ids, not with the number of special tokens.

This isn't specific to Kimi — any model that ships a slow custom tokenizer (e.g. via trust_remote_code=True) and is used for streaming detokenization (which calls convert_ids_to_tokens(..., skip_special_tokens=True) per generated token batch) will hit it. We discovered this when a TensorRT-LLM streaming benchmark on Kimi-K2.5 showed an 80% TPOT regression on the transformers 4.57 → 5.3 upgrade — DeepSeek-V3 family models (which use a fast tokenizer) on the same upgrade were unaffected.

Expected behavior

convert_ids_to_tokens(ids, skip_special_tokens=True) should be ~the same cost as skip_special_tokens=False, plus a single set-membership check per id.

Suggested fix

Mirror the fix already in TokenizersBackend.convert_ids_to_tokens — it's the same pattern:

         tokens = []
+        # self.all_special_ids is an @property which may be slow, so only compute it once before the loop
+        ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
         for index in ids:
             index = int(index)
-            if skip_special_tokens and index in self.all_special_ids:
+            if index in ids_to_skip:
                 continue
             tokens.append(
                 self._added_tokens_decoder[index].content
                 if index in self._added_tokens_decoder
                 else self._convert_id_to_token(index)
             )
         return tokens

After this fix, on the same Kimi-K2.5 repro

512 ids skip=True :   227 us/call   (was 48009)  →  ~211× faster
512 ids skip=False:   129 us/call   (unchanged)
1   id  skip=True :    95 us/call   (unchanged — see below)

Thanks!

extent analysis

TL;DR

To fix the performance issue in convert_ids_to_tokens, hoist the computation of self.all_special_ids outside the loop by storing it in a set before the loop starts.

Guidance

Identify the performance bottleneck: The convert_ids_to_tokens method is slow due to the repeated computation of self.all_special_ids inside the loop.
Apply the suggested fix: Mirror the fix already in TokenizersBackend.convert_ids_to_tokens by computing self.all_special_ids once before the loop and storing it in a set.
Verify the fix: Run the provided minimal repro code to measure the performance improvement.
Consider a follow-up fix: Cache the property output on the tokenizer instance to improve performance for single-id cases.

Example

# tokenization_python.py (fixed)
def convert_ids_to_tokens(self, ids, skip_special_tokens=False):
    ...
    tokens = []
    ids_to_skip = set(self.all_special_ids) if skip_special_tokens else set()
    for index in ids:
        index = int(index)
        if index in ids_to_skip:
            continue
        tokens.append(...)
    return tokens

Notes

The provided fix only addresses the performance issue for batched tokenization. A separate issue may be filed to address the single-id case.

Recommendation

Apply the workaround by mirroring the fix already in TokenizersBackend.convert_ids_to_tokens, as it provides a significant performance improvement for batched tokenization.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

convert_ids_to_tokens(ids, skip_special_tokens=True) should be ~the same cost as skip_special_tokens=False, plus a single set-membership check per id.

#task chaining #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix PreTrainedTokenizer.convert_ids_to_tokens(skip_special_tokens=True) rebuilds all_special_ids on every iteration of the per-id loop [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #45728: PythonBackend slow tokenizer convert_ids_to_tokens fix

Description (problem / solution / changelog)

What does this PR do?

Before submitting

Who can review?

Changed files

Code Example

System Info

Who can help?

Reproduction

Minimal repro on Kimi-K2.5 (TikTokenTokenizer, slow, loaded via trust_remote_code=True)

Output (transformers 5.3.0, unmodified)

Expected behavior

Suggested fix

After this fix, on the same Kimi-K2.5 repro

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Minimal repro on Kimi-K2.5 (TikTokenTokenizer, slow, loaded via `trust_remote_code=True`)