transformers - ✅(Solved) Fix _patch_mistral_regex crashes with AttributeError: 'tokenizers.Tokenizer' object has no attribute 'backend_tokenizer' when loading Mistral tokenizer with fix_mistral_regex=True [2 pull requests, 1 participants]

Q: Expected behavior

`fix_mistral_regex=True` should successfully replace the incorrect pre-tokenizer regex pattern in the Mistral tokenizer without raising any error. --- **Root cause analysis and suggested fix** In `tokenization_utils_tokenizers.py`, `_patch_mistral_regex` is called from `__init__` as: ```python # line ~477 self._tokenizer = self._patch_mistral_regex( self._tokenizer, # <-- this is a raw tokenizers.Tokenizer (Rust object) ... ) ``` Inside `_patch_mistral_regex`, line 1363 then does: ```python current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer # BUG ``` But `tokenizer` here is already `self._tokenizer` — the raw Rust `tokenizers.Tokenizer` object. The `.backend_tokenizer` property exists on the Python-level `PreTrainedTokenizerFast` / `TokenizersBackend` wrapper, **not** on the underlying Rust object itself. **Fix:** access `.pre_tokenizer` directly, since the argument is already the backend tokenizer: ```python # line 1363 — change: current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer # to: current_pretokenizer = tokenizer.pre_tokenizer ``` And correspondingly update any write-back in the same method that goes through `.backend_tokenizer` to use the object directly.

transformers2026-03-28 13:20:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45081•Fetched 2026-04-08 01:45:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

kruthtom0

Participants

kruthtom0

Timeline (top)

mentioned ×2subscribed ×2cross-referenced ×1labeled ×1

Error Message

Traceback (most recent call last): File "/mnt/RAPID/tmp/repro_mistral_regex_bug.py", line 23, in <module> tokenizer = AutoTokenizer.from_pretrained( "mistralai/Mistral-Nemo-Instruct-2407", trust_remote_code=True, fix_mistral_regex=True, ) File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 723, in from_pretrained return tokenizer_class_from_name(tokenizer_config_class).from_pretrained( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ pretrained_model_name_or_path, *inputs, **kwargs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1721, in from_pretrained return cls._from_pretrained( ~~~~~~~~~~~~~~~~~~~~^ resolved_vocab_files, ^^^^^^^^^^^^^^^^^^^^^ ...<9 lines>... **kwargs, ^^^^^^^^^ ) ^ File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1910, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 477, in init self._tokenizer = self._patch_mistral_regex( ~~~~~~~~~~~~~~~~~~~~~~~~~^ self._tokenizer, ^^^^^^^^^^^^^^^^ ...<3 lines>... **kwargs, ^^^^^^^^^ ) ^ File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 1363, in _patch_mistral_regex current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer ^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'tokenizers.Tokenizer' object has no attribute 'backend_tokenizer'

Root Cause

Root cause analysis and suggested fix

Fix Action

Fix / Workaround

tokenizer = AutoTokenizer.from_pretrained( "mistralai/Mistral-Nemo-Instruct-2407", trust_remote_code=True, fix_mistral_regex=True, )


In `tokenization_utils_tokenizers.py`, `_patch_mistral_regex` is called from `__init__` as:

```python
# line ~477
self._tokenizer = self._patch_mistral_regex(
    self._tokenizer,   # <-- this is a raw tokenizers.Tokenizer (Rust object)
    ...
)

PR fix notes

PR #45086: fix AttributeError in _patch_mistral_regex

Repository: huggingface/transformers
Author: knQzx
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45086

Description (problem / solution / changelog)

the function accesses backend_tokenizer.pre_tokenizer but the tokenizer passed is already the raw rust object, so it should be pre_tokenizer directly. fixes #45081

Changed files

src/transformers/tokenization_utils_tokenizers.py (modified, +3/-3)

PR #45317: Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True

Repository: huggingface/transformers
Author: mohdfaour03
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45317

Description (problem / solution / changelog)

Fixes #45081

Problem

Loading a Mistral tokenizer with fix_mistral_regex=True crashes because _patch_mistral_regex receives a raw tokenizers.Tokenizer but tries to access .backend_tokenizer.pre_tokenizer on it — that attribute only exists on PreTrainedTokenizerFast.

Fix

Removed the .backend_tokenizer indirection (3 lines) since the tokenizer passed in is already the backend tokenizer.

AI tools helped locate the bug. I reviewed and understood the fix.

Before submitting

Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. Answer : Issue #45081

Who can review?

@ArthurZucker @itazap (tokenizers)

Changed files

src/transformers/tokenization_utils_tokenizers.py (modified, +3/-3)
tests/models/auto/test_tokenization_auto.py (modified, +21/-0)

Code Example

import transformers

print(f"transformers version: {transformers.__version__}")
print("Loading Mistral tokenizer with fix_mistral_regex=True ...")

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "mistralai/Mistral-Nemo-Instruct-2407",
    trust_remote_code=True,
    fix_mistral_regex=True,
)

---

Traceback (most recent call last):
  File "/mnt/RAPID/tmp/repro_mistral_regex_bug.py", line 23, in <module>
    tokenizer = AutoTokenizer.from_pretrained(
        "mistralai/Mistral-Nemo-Instruct-2407",
        trust_remote_code=True,
        fix_mistral_regex=True,
    )
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 723, in from_pretrained
    return tokenizer_class_from_name(tokenizer_config_class).from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, *inputs, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1721, in from_pretrained
    return cls._from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~^
        resolved_vocab_files,
        ^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1910, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 477, in __init__
    self._tokenizer = self._patch_mistral_regex(
                      ~~~~~~~~~~~~~~~~~~~~~~~~~^
        self._tokenizer,
        ^^^^^^^^^^^^^^^^
    ...<3 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 1363, in _patch_mistral_regex
    current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tokenizers.Tokenizer' object has no attribute 'backend_tokenizer'

---

# line ~477
self._tokenizer = self._patch_mistral_regex(
    self._tokenizer,   # <-- this is a raw tokenizers.Tokenizer (Rust object)
    ...
)

---

current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer  # BUG

---

# line 1363 — change:
current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer
# to:
current_pretokenizer = tokenizer.pre_tokenizer

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.4.0
Platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.35
Python version: 3.13.5
Huggingface_hub version: 1.8.0
Safetensors version: 0.7.0
Accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.8.0+cu128 (CUDA)
Using distributed or parallel set-up in script?: <fill in>
Using GPU in script?: <fill in>
GPU type: NVIDIA A100-PCIE-40GB

Who can help?

@ArthurZucker @itazap

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import transformers

print(f"transformers version: {transformers.__version__}")
print("Loading Mistral tokenizer with fix_mistral_regex=True ...")

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "mistralai/Mistral-Nemo-Instruct-2407",
    trust_remote_code=True,
    fix_mistral_regex=True,
)

Traceback (most recent call last):
  File "/mnt/RAPID/tmp/repro_mistral_regex_bug.py", line 23, in <module>
    tokenizer = AutoTokenizer.from_pretrained(
        "mistralai/Mistral-Nemo-Instruct-2407",
        trust_remote_code=True,
        fix_mistral_regex=True,
    )
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 723, in from_pretrained
    return tokenizer_class_from_name(tokenizer_config_class).from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, *inputs, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1721, in from_pretrained
    return cls._from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~^
        resolved_vocab_files,
        ^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1910, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 477, in __init__
    self._tokenizer = self._patch_mistral_regex(
                      ~~~~~~~~~~~~~~~~~~~~~~~~~^
        self._tokenizer,
        ^^^^^^^^^^^^^^^^
    ...<3 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/root/miniconda3/envs/myconda/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 1363, in _patch_mistral_regex
    current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tokenizers.Tokenizer' object has no attribute 'backend_tokenizer'

Expected behavior

fix_mistral_regex=True should successfully replace the incorrect pre-tokenizer regex pattern in the Mistral tokenizer without raising any error.

Root cause analysis and suggested fix

In tokenization_utils_tokenizers.py, _patch_mistral_regex is called from __init__ as:

# line ~477
self._tokenizer = self._patch_mistral_regex(
    self._tokenizer,   # <-- this is a raw tokenizers.Tokenizer (Rust object)
    ...
)

Inside _patch_mistral_regex, line 1363 then does:

current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer  # BUG

But tokenizer here is already self._tokenizer — the raw Rust tokenizers.Tokenizer object. The .backend_tokenizer property exists on the Python-level PreTrainedTokenizerFast / TokenizersBackend wrapper, not on the underlying Rust object itself.

Fix: access .pre_tokenizer directly, since the argument is already the backend tokenizer:

# line 1363 — change:
current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer
# to:
current_pretokenizer = tokenizer.pre_tokenizer

And correspondingly update any write-back in the same method that goes through .backend_tokenizer to use the object directly.

extent analysis

Fix Plan

To fix the issue, you need to update the tokenization_utils_tokenizers.py file. Here are the steps:

Open the tokenization_utils_tokenizers.py file located in your transformers library installation.
Find the _patch_mistral_regex method.
Update the line current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer to current_pretokenizer = tokenizer.pre_tokenizer.
Update any corresponding write-back lines that use .backend_tokenizer to use the object directly.

Example code snippet:

# Before
current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer

# After
current_pretokenizer = tokenizer.pre_tokenizer

Make sure to update all occurrences of .backend_tokenizer in the _patch_mistral_regex method.

Verification

To verify that the fix worked, run your original code again:

import transformers

print(f"transformers version: {transformers.__version__}")
print("Loading Mistral tokenizer with fix_mistral_regex=True ...")

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "mistralai/Mistral-Nemo-Instruct-2407",
    trust_remote_code=True,
    fix_mistral_regex=True,
)

If the fix is successful, the code should run without raising any errors.

Extra Tips

Make sure to update the transformers library to the latest version if possible.
If you are using a virtual environment, ensure that the changes are made in the correct environment.
If you are using a package manager like pip, you may need to reinstall the transformers library after making the changes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

fix_mistral_regex=True should successfully replace the incorrect pre-tokenizer regex pattern in the Mistral tokenizer without raising any error.

Root cause analysis and suggested fix

In tokenization_utils_tokenizers.py, _patch_mistral_regex is called from __init__ as:

# line ~477
self._tokenizer = self._patch_mistral_regex(
    self._tokenizer,   # <-- this is a raw tokenizers.Tokenizer (Rust object)
    ...
)

Inside _patch_mistral_regex, line 1363 then does:

current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer  # BUG

Fix: access .pre_tokenizer directly, since the argument is already the backend tokenizer:

# line 1363 — change:
current_pretokenizer = tokenizer.backend_tokenizer.pre_tokenizer
# to:
current_pretokenizer = tokenizer.pre_tokenizer

And correspondingly update any write-back in the same method that goes through .backend_tokenizer to use the object directly.

#api #agent execution #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix _patch_mistral_regex crashes with AttributeError: 'tokenizers.Tokenizer' object has no attribute 'backend_tokenizer' when loading Mistral tokenizer with fix_mistral_regex=True [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #45086: fix AttributeError in _patch_mistral_regex

Description (problem / solution / changelog)

Changed files

PR #45317: Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True

Description (problem / solution / changelog)

Problem

Fix

Before submitting

Who can review?

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING