transformers - 💡(How to fix) Fix transformers >= 5.0.0 fails loading tokenizer for EMBEDDIA/est-roberta [6 comments, 3 participants]

transformers2026-03-25 10:36:01

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44991•Fetched 2026-04-08 01:26:13

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×6closed ×1labeled ×1mentioned ×1

Error Message

from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("EMBEDDIA/est-roberta") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 749, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\tokenization_utils_base.py", line 1721, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\tokenization_utils_base.py", line 1910, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\camembert\tokenization_camembert.py", line 118, in init unk_index = next((i for i, (tok, _) in enumerate(self._vocab) if tok == str(unk_token)), 0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\camembert\tokenization_camembert.py", line 118, in <genexpr> unk_index = next((i for i, (tok, _) in enumerate(self._vocab) if tok == str(unk_token)), 0) ^^^^^^^^ ValueError: too many values to unpack (expected 2)

Code Example

>>> from transformers import AutoTokenizer, AutoModelForMaskedLM
>>> tokenizer = AutoTokenizer.from_pretrained("EMBEDDIA/est-roberta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 749, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\tokenization_utils_base.py", line 1721, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\tokenization_utils_base.py", line 1910, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\camembert\tokenization_camembert.py", line 118, in __init__
    unk_index = next((i for i, (tok, _) in enumerate(self._vocab) if tok == str(unk_token)), 0)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\camembert\tokenization_camembert.py", line 118, in <genexpr>
    unk_index = next((i for i, (tok, _) in enumerate(self._vocab) if tok == str(unk_token)), 0)
                               ^^^^^^^^
ValueError: too many values to unpack (expected 2)

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.3.0
Platform: Windows-11-10.0.26200-SP0
Python version: 3.12.13
Huggingface_hub version: 1.7.2
Safetensors version: 0.7.0
Accelerate version: not installed
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.11.0+cpu (NA)
Using distributed or parallel set-up in script?: no

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

AutoTokenizer fails to load a tokenizer of the model "EMBEDDIA/est-roberta". The problem seems to be related to tokenizer API changes introduced in Transformers v5, as the loading works fine in v4 ( I tested it on transformers 4.57.6 ).

>>> from transformers import AutoTokenizer, AutoModelForMaskedLM
>>> tokenizer = AutoTokenizer.from_pretrained("EMBEDDIA/est-roberta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 749, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\tokenization_utils_base.py", line 1721, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\tokenization_utils_base.py", line 1910, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\camembert\tokenization_camembert.py", line 118, in __init__
    unk_index = next((i for i, (tok, _) in enumerate(self._vocab) if tok == str(unk_token)), 0)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programmid\Miniconda3\envs\py312_transformers_problem\Lib\site-packages\transformers\models\camembert\tokenization_camembert.py", line 118, in <genexpr>
    unk_index = next((i for i, (tok, _) in enumerate(self._vocab) if tok == str(unk_token)), 0)
                               ^^^^^^^^
ValueError: too many values to unpack (expected 2)

Expected behavior

loading the tokenizer succeeds gracefully :)

extent analysis

Fix Plan

The issue arises from the tokenization_camembert.py file, which is not compatible with the EMBEDDIA/est-roberta model. To fix this, we need to use the correct tokenizer for the model.

Check the model's documentation to find the correct tokenizer class.
Use the AutoTokenizer with the use_fast parameter set to False to use the slow tokenizer.

from transformers import AutoTokenizer

# Load the tokenizer with use_fast=False
tokenizer = AutoTokenizer.from_pretrained("EMBEDDIA/est-roberta", use_fast=False)

Alternatively, you can try to use the BertTokenizer or RobertaTokenizer directly:

from transformers import BertTokenizer, RobertaTokenizer

# Load the tokenizer
tokenizer = RobertaTokenizer.from_pretrained("EMBEDDIA/est-roberta")

Verification

To verify that the fix worked, you can try to load the tokenizer and use it to encode a sentence:

input_text = "This is a test sentence."
inputs = tokenizer(input_text, return_tensors="pt")

print(inputs)

If the tokenizer loads successfully and encodes the sentence without errors, the fix has worked.

Extra Tips

Make sure to check the model's documentation for the recommended tokenizer and configuration.
If you're using a custom model, ensure that the tokenizer is compatible with the model's architecture.
You can also try to update the transformers library to the latest version to see if the issue is resolved.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

loading the tokenizer succeeds gracefully :)

#api #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix transformers >= 5.0.0 fails loading tokenizer for EMBEDDIA/est-roberta [6 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix transformers >= 5.0.0 fails loading tokenizer for EMBEDDIA/est-roberta [6 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING