transformers - 💡(How to fix) Fix [Bug] extra_special_tokens as list in tokenizer_config.json causes AttributeError when loading gemma4 (and other v5-format models) with transformers v4 [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45376Fetched 2026-04-12 13:24:01
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
mentioned ×2subscribed ×2commented ×1labeled ×1

Error Message

File .../transformers/tokenization_utils_base.py:1181 self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys()) AttributeError: 'list' object has no attribute 'keys'

Root Cause

The model's tokenizer_config.json stores extra_special_tokens as a list (the v5 format):

"extra_special_tokens": ["<token_a>", "<token_b>"]

But _set_model_specific_special_tokens() in tokenization_utils_base.py unconditionally calls .keys() on this value, assuming it is always a dict. This works in v5 (which added list support) but crashes in all v4 releases.

Fix Action

Fix / Workaround

Workaround (for users)

Code Example

- transformers version: 4.x (any release before v5)
- Platform: macOS (Apple Silicon) / Linux
- Python version: 3.12 / 3.13
- PyTorch version: latest
- Model: google/gemma-4-E4B-it

---

pip install transformers  # installs v4.x

---

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

---

File .../transformers/tokenization_utils_base.py:1181
    self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
AttributeError: 'list' object has no attribute 'keys'

---

"extra_special_tokens": ["<token_a>", "<token_b>"]

---

def _set_model_specific_special_tokens(self, special_tokens):
    if isinstance(special_tokens, list):
        # v5-style list format — nothing to register, skip
        return
    self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
    for key, value in special_tokens.items():
        ...

---

import json
from pathlib import Path
from huggingface_hub import hf_hub_download

config_path = Path(hf_hub_download("google/gemma-4-E4B-it", "tokenizer_config.json"))
with open(config_path) as f:
    config = json.load(f)
if isinstance(config.get("extra_special_tokens"), list):
    config["extra_special_tokens"] = {}
    with open(config_path, "w") as f:
        json.dump(config, f)

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

---

pip install transformers  # installs v4.x

---

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")
RAW_BUFFERClick to expand / collapse

System Info

System info

- transformers version: 4.x (any release before v5)
- Platform: macOS (Apple Silicon) / Linux
- Python version: 3.12 / 3.13
- PyTorch version: latest
- Model: google/gemma-4-E4B-it

Who can reproduce?

Anyone loading google/gemma-4-E4B-it (or any model whose tokenizer_config.json uses the v5-style extra_special_tokens list format) with transformers v4.

Steps to reproduce

pip install transformers  # installs v4.x
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

Expected behavior

Tokenizer loads successfully.

Actual behavior

File .../transformers/tokenization_utils_base.py:1181
    self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
AttributeError: 'list' object has no attribute 'keys'

Root cause

The model's tokenizer_config.json stores extra_special_tokens as a list (the v5 format):

"extra_special_tokens": ["<token_a>", "<token_b>"]

But _set_model_specific_special_tokens() in tokenization_utils_base.py unconditionally calls .keys() on this value, assuming it is always a dict. This works in v5 (which added list support) but crashes in all v4 releases.

Suggested fix

Add a type check in _set_model_specific_special_tokens() to handle both formats gracefully:

def _set_model_specific_special_tokens(self, special_tokens):
    if isinstance(special_tokens, list):
        # v5-style list format — nothing to register, skip
        return
    self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
    for key, value in special_tokens.items():
        ...

Workaround (for users)

import json
from pathlib import Path
from huggingface_hub import hf_hub_download

config_path = Path(hf_hub_download("google/gemma-4-E4B-it", "tokenizer_config.json"))
with open(config_path) as f:
    config = json.load(f)
if isinstance(config.get("extra_special_tokens"), list):
    config["extra_special_tokens"] = {}
    with open(config_path, "w") as f:
        json.dump(config, f)

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

Who can help?

@ArthurZucker / @itazap can you please look into this?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

pip install transformers  # installs v4.x
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

Expected behavior

Expected behavior

Tokenizer loads successfully.

extent analysis

TL;DR

To fix the issue, add a type check in _set_model_specific_special_tokens() to handle both dictionary and list formats of extra_special_tokens or apply the provided workaround by converting the extra_special_tokens list to an empty dictionary in the tokenizer_config.json file.

Guidance

  • The error occurs because the extra_special_tokens in the model's tokenizer_config.json is in a list format, which is not compatible with the transformers version 4.x.
  • To verify the issue, check the format of extra_special_tokens in the tokenizer_config.json file and ensure that the transformers version is 4.x.
  • Apply the suggested fix by adding a type check in _set_model_specific_special_tokens() to handle both formats.
  • Alternatively, use the provided workaround to convert the extra_special_tokens list to an empty dictionary in the tokenizer_config.json file.

Example

import json
from pathlib import Path
from huggingface_hub import hf_hub_download

config_path = Path(hf_hub_download("google/gemma-4-E4B-it", "tokenizer_config.json"))
with open(config_path) as f:
    config = json.load(f)
if isinstance(config.get("extra_special_tokens"), list):
    config["extra_special_tokens"] = {}
    with open(config_path, "w") as f:
        json.dump(config, f)

Notes

  • This issue is specific to transformers version 4.x and models with extra_special_tokens in a list format.
  • The suggested fix requires modifying the transformers library code, while the workaround modifies the model's configuration file.

Recommendation

Apply the workaround by converting the extra_special_tokens list to an empty dictionary in the tokenizer_config.json file, as it is a simpler and more straightforward solution that does not require modifying the transformers library code.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Tokenizer loads successfully.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING