transformers - 💡(How to fix) Fix Latest version cannot load "vesteinn/ScandiBERT" [4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44451Fetched 2026-04-08 00:28:26
View on GitHub
Comments
4
Participants
2
Timeline
12
Reactions
0
Timeline (top)
commented ×4mentioned ×3subscribed ×3closed ×1

Error Message

from transformers import AutoTokenizer bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")

Traceback (most recent call last): File "<python-input-2>", line 1, in <module> bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT") File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 712, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1712, in from_pretrained return cls._from_pretrained( ~~~~~~~~~~~~~~~~~~~~^ resolved_vocab_files, ^^^^^^^^^^^^^^^^^^^^^ ...<9 lines>... **kwargs, ^^^^^^^^^ ) ^ File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1897, in _from_pretrained init_kwargs = cls.convert_to_native_format(**init_kwargs) File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 127, in convert_to_native_format if vocab and isinstance(vocab[0], (list, tuple)): ~~~~~^^^ KeyError: 0

Code Example

>>> from transformers import AutoTokenizer
>>> bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")

Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 712, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1712, in from_pretrained
    return cls._from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~^
        resolved_vocab_files,
        ^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1897, in _from_pretrained
    init_kwargs = cls.convert_to_native_format(**init_kwargs)
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 127, in convert_to_native_format
    if vocab and isinstance(vocab[0], (list, tuple)):
                            ~~~~~^^^
KeyError: 0
RAW_BUFFERClick to expand / collapse

System Info

broken config:

Python 3.13.5
tokenizers 0.22.2
transformers 5.2.0
torch 2.7.1+cu118

working config:

Python 3.13.5
tokenizers 0.22.1
transformers 4.57.1
torch 2.8.0+cu129

Who can help?

@ArthurZucker @Cyrilvallez

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

>>> from transformers import AutoTokenizer
>>> bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")

Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 712, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1712, in from_pretrained
    return cls._from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~^
        resolved_vocab_files,
        ^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1897, in _from_pretrained
    init_kwargs = cls.convert_to_native_format(**init_kwargs)
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 127, in convert_to_native_format
    if vocab and isinstance(vocab[0], (list, tuple)):
                            ~~~~~^^^
KeyError: 0

Expected behavior

Loading without an exception would be better!

extent analysis

Fix Plan

Downgrade Tokenizers

The issue is caused by a compatibility problem between transformers 5.2.0 and tokenizers 0.22.2. Downgrading tokenizers to 0.22.1 should resolve the issue.

Steps

  1. Downgrade Tokenizers:

pip install --force-reinstall tokenizers==0.22.1

2. **Verify Downgrade**:
   Check the installed version of `tokenizers`:
   ```bash
pip show tokenizers

It should show tokenizers 0.22.1.

Alternative Solution

If downgrading tokenizers is not feasible, you can also try downgrading transformers to a version that is compatible with tokenizers 0.22.2. However, this may require further investigation to find the correct version.

Example Use Case

After downgrading tokenizers, you should be able to load the bert_tokenizer without any exceptions:

>>> from transformers import AutoTokenizer
>>> bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")

This should load the tokenizer without any errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Loading without an exception would be better!

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING