transformers - 💡(How to fix) Fix Latest version cannot load "vesteinn/ScandiBERT" [4 comments, 2 participants]

transformers2026-03-05 03:02:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44451•Fetched 2026-04-08 00:28:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

AngledLuffa

Participants

AngledLuffa

Cyrilvallez

Timeline (top)

commented ×4mentioned ×3subscribed ×3closed ×1

Error Message

from transformers import AutoTokenizer bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")

Traceback (most recent call last): File "<python-input-2>", line 1, in <module> bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT") File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 712, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1712, in from_pretrained return cls._from_pretrained( ~~~~~~~~~~~~~~~~~~~~^ resolved_vocab_files, ^^^^^^^^^^^^^^^^^^^^^ ...<9 lines>... **kwargs, ^^^^^^^^^ ) ^ File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1897, in _from_pretrained init_kwargs = cls.convert_to_native_format(**init_kwargs) File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 127, in convert_to_native_format if vocab and isinstance(vocab[0], (list, tuple)): ~~~~~^^^ KeyError: 0

Code Example

>>> from transformers import AutoTokenizer
>>> bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")

Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 712, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1712, in from_pretrained
    return cls._from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~^
        resolved_vocab_files,
        ^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1897, in _from_pretrained
    init_kwargs = cls.convert_to_native_format(**init_kwargs)
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 127, in convert_to_native_format
    if vocab and isinstance(vocab[0], (list, tuple)):
                            ~~~~~^^^
KeyError: 0

RAW_BUFFERClick to expand / collapse

System Info

broken config:

Python 3.13.5
tokenizers 0.22.2
transformers 5.2.0
torch 2.7.1+cu118

working config:

Python 3.13.5
tokenizers 0.22.1
transformers 4.57.1
torch 2.8.0+cu129

Who can help?

@ArthurZucker @Cyrilvallez

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

>>> from transformers import AutoTokenizer
>>> bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")

Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/models/auto/tokenization_auto.py", line 712, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1712, in from_pretrained
    return cls._from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~^
        resolved_vocab_files,
        ^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1897, in _from_pretrained
    init_kwargs = cls.convert_to_native_format(**init_kwargs)
  File "/nlp/scr/horatio/miniconda3/lib/python3.13/site-packages/transformers/tokenization_utils_tokenizers.py", line 127, in convert_to_native_format
    if vocab and isinstance(vocab[0], (list, tuple)):
                            ~~~~~^^^
KeyError: 0

Expected behavior

Loading without an exception would be better!

extent analysis

Fix Plan

Downgrade Tokenizers

The issue is caused by a compatibility problem between transformers 5.2.0 and tokenizers 0.22.2. Downgrading tokenizers to 0.22.1 should resolve the issue.

Steps

Downgrade Tokenizers:

pip install --force-reinstall tokenizers==0.22.1

2. **Verify Downgrade**:
   Check the installed version of `tokenizers`:
   ```bash
pip show tokenizers

It should show tokenizers 0.22.1.

Alternative Solution

If downgrading tokenizers is not feasible, you can also try downgrading transformers to a version that is compatible with tokenizers 0.22.2. However, this may require further investigation to find the correct version.

Example Use Case

After downgrading tokenizers, you should be able to load the bert_tokenizer without any exceptions:

>>> from transformers import AutoTokenizer
>>> bert_tokenizer = AutoTokenizer.from_pretrained("vesteinn/ScandiBERT")

This should load the tokenizer without any errors.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Loading without an exception would be better!

#api #ssr #installation #tensor shape #autograd error #output truncation #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix Latest version cannot load "vesteinn/ScandiBERT" [4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Downgrade Tokenizers

Steps

Alternative Solution

Example Use Case

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix Latest version cannot load "vesteinn/ScandiBERT" [4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Downgrade Tokenizers

Steps

Alternative Solution

Example Use Case

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING