transformers - 💡(How to fix) Fix [Bug] extra_special_tokens as list in tokenizer_config.json causes AttributeError when loading gemma4 (and other v5-format models) with transformers v4 [1 comments, 2 participants]

transformers2026-04-11 12:57:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45376•Fetched 2026-04-12 13:24:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

HARISH-CS-01

Participants

HARISH-CS-01

hijingsong

Timeline (top)

mentioned ×2subscribed ×2commented ×1labeled ×1

Error Message

File .../transformers/tokenization_utils_base.py:1181 self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys()) AttributeError: 'list' object has no attribute 'keys'

Root Cause

The model's tokenizer_config.json stores extra_special_tokens as a list (the v5 format):

"extra_special_tokens": ["<token_a>", "<token_b>"]

But _set_model_specific_special_tokens() in tokenization_utils_base.py unconditionally calls .keys() on this value, assuming it is always a dict. This works in v5 (which added list support) but crashes in all v4 releases.

Fix Action

Fix / Workaround

Workaround (for users)

Code Example

- transformers version: 4.x (any release before v5)
- Platform: macOS (Apple Silicon) / Linux
- Python version: 3.12 / 3.13
- PyTorch version: latest
- Model: google/gemma-4-E4B-it

---

pip install transformers  # installs v4.x

---

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

---

File .../transformers/tokenization_utils_base.py:1181
    self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
AttributeError: 'list' object has no attribute 'keys'

---

"extra_special_tokens": ["<token_a>", "<token_b>"]

---

def _set_model_specific_special_tokens(self, special_tokens):
    if isinstance(special_tokens, list):
        # v5-style list format — nothing to register, skip
        return
    self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
    for key, value in special_tokens.items():
        ...

---

import json
from pathlib import Path
from huggingface_hub import hf_hub_download

config_path = Path(hf_hub_download("google/gemma-4-E4B-it", "tokenizer_config.json"))
with open(config_path) as f:
    config = json.load(f)
if isinstance(config.get("extra_special_tokens"), list):
    config["extra_special_tokens"] = {}
    with open(config_path, "w") as f:
        json.dump(config, f)

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

---

pip install transformers  # installs v4.x

---

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

RAW_BUFFERClick to expand / collapse

System Info

System info

- transformers version: 4.x (any release before v5)
- Platform: macOS (Apple Silicon) / Linux
- Python version: 3.12 / 3.13
- PyTorch version: latest
- Model: google/gemma-4-E4B-it

Who can reproduce?

Anyone loading google/gemma-4-E4B-it (or any model whose tokenizer_config.json uses the v5-style extra_special_tokens list format) with transformers v4.

Steps to reproduce

pip install transformers  # installs v4.x

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

Expected behavior

Tokenizer loads successfully.

Actual behavior

File .../transformers/tokenization_utils_base.py:1181
    self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
AttributeError: 'list' object has no attribute 'keys'

Root cause

The model's tokenizer_config.json stores extra_special_tokens as a list (the v5 format):

"extra_special_tokens": ["<token_a>", "<token_b>"]

Suggested fix

Add a type check in _set_model_specific_special_tokens() to handle both formats gracefully:

def _set_model_specific_special_tokens(self, special_tokens):
    if isinstance(special_tokens, list):
        # v5-style list format — nothing to register, skip
        return
    self.SPECIAL_TOKENS_ATTRIBUTES = self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
    for key, value in special_tokens.items():
        ...

Workaround (for users)

import json
from pathlib import Path
from huggingface_hub import hf_hub_download

config_path = Path(hf_hub_download("google/gemma-4-E4B-it", "tokenizer_config.json"))
with open(config_path) as f:
    config = json.load(f)
if isinstance(config.get("extra_special_tokens"), list):
    config["extra_special_tokens"] = {}
    with open(config_path, "w") as f:
        json.dump(config, f)

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

Who can help?

@ArthurZucker / @itazap can you please look into this?

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

pip install transformers  # installs v4.x

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")

Expected behavior

Tokenizer loads successfully.

extent analysis

TL;DR

To fix the issue, add a type check in _set_model_specific_special_tokens() to handle both dictionary and list formats of extra_special_tokens or apply the provided workaround by converting the extra_special_tokens list to an empty dictionary in the tokenizer_config.json file.

Guidance

The error occurs because the extra_special_tokens in the model's tokenizer_config.json is in a list format, which is not compatible with the transformers version 4.x.
To verify the issue, check the format of extra_special_tokens in the tokenizer_config.json file and ensure that the transformers version is 4.x.
Apply the suggested fix by adding a type check in _set_model_specific_special_tokens() to handle both formats.
Alternatively, use the provided workaround to convert the extra_special_tokens list to an empty dictionary in the tokenizer_config.json file.

Example

import json
from pathlib import Path
from huggingface_hub import hf_hub_download

config_path = Path(hf_hub_download("google/gemma-4-E4B-it", "tokenizer_config.json"))
with open(config_path) as f:
    config = json.load(f)
if isinstance(config.get("extra_special_tokens"), list):
    config["extra_special_tokens"] = {}
    with open(config_path, "w") as f:
        json.dump(config, f)

Notes

This issue is specific to transformers version 4.x and models with extra_special_tokens in a list format.
The suggested fix requires modifying the transformers library code, while the workaround modifies the model's configuration file.

Recommendation

Apply the workaround by converting the extra_special_tokens list to an empty dictionary in the tokenizer_config.json file, as it is a simpler and more straightforward solution that does not require modifying the transformers library code.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Tokenizer loads successfully.

#tokenizer error #prompt formatting #chain error #conversation history #tool integration

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - 💡(How to fix) Fix [Bug] extra_special_tokens as list in tokenizer_config.json causes AttributeError when loading gemma4 (and other v5-format models) with transformers v4 [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround (for users)

Code Example

System Info

System info

Who can reproduce?

Steps to reproduce

Expected behavior

Actual behavior

Root cause

Suggested fix

Workaround (for users)

Who can help?

Information

Tasks

Reproduction

Expected behavior

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING