transformers - ✅(Solved) Fix [t5gemma] resize_token_embeddings does not effect to decoder.embed_tokens [1 pull requests]

Q: Expected behavior

All `e.num_embeddings` `f.num_embeddings` and `g.out_features` should be increased to 256001.

transformers2026-04-09 08:11:47

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fixed

Fixed by PR: Gemma4 resizing per layer inputs (https://github.com/huggingface/transformers/pull/45324)

PR fix notes

PR #45324: Gemma4 resizing per layer inputs

Repository: huggingface/transformers
Author: zucchini-nlp
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45324

Description (problem / solution / changelog)

What does this PR do?

Fixes https://github.com/huggingface/transformers/issues/45276 and https://github.com/huggingface/transformers/issues/45335

In gemma4 per-layer inputs have to be resized as long as they aren't part of soft multimodal tokens

Repro for T5 gemma:

from transformers import T5GemmaForConditionalGeneration

model = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
encoder = model.resize_token_embeddings(model.vocab_size + 10)
decoder = model.model.decoder.embed_tokens
head = model.get_output_embeddings()
print(encoder.weight.shape, decoder.weight.shape, head.weight.shape)
# LM head resize is reverted back because of tying. Decoder is never resized
>>> torch.Size([256010, 512]) torch.Size([256000, 512]) torch.Size([256000, 512])

Gemma3n has soft mm tokens, and the current state of 3N is not good. I see unused vocab entries in mm_projection 😢 and if when apply simply resizing to per-layer-input, we'll get even more unused entries

Could be done in the correct way if we filter out mm-tokens, but I'd prefer to leave it for now

Changed files

src/transformers/models/gemma3/modeling_gemma3.py (modified, +0/-6)
src/transformers/models/gemma3n/configuration_gemma3n.py (modified, +0/-1)
src/transformers/models/gemma3n/modeling_gemma3n.py (modified, +55/-6)
src/transformers/models/gemma3n/modular_gemma3n.py (modified, +61/-1)
src/transformers/models/gemma4/modeling_gemma4.py (modified, +68/-12)
src/transformers/models/gemma4/modular_gemma4.py (modified, +22/-10)
src/transformers/models/paligemma/modeling_paligemma.py (modified, +0/-6)
src/transformers/models/t5gemma/modeling_t5gemma.py (modified, +6/-0)
src/transformers/models/t5gemma/modular_t5gemma.py (modified, +6/-0)
tests/models/blip/test_modeling_blip.py (modified, +2/-2)
tests/models/colmodernvbert/test_modeling_colmodernvbert.py (modified, +6/-4)
tests/models/lfm2_vl/test_modeling_lfm2_vl.py (modified, +1/-1)
tests/models/qwen3_vl/test_modeling_qwen3_vl.py (modified, +2/-2)
tests/models/qwen3_vl_moe/test_modeling_qwen3_vl_moe.py (modified, +2/-2)
tests/test_modeling_common.py (modified, +3/-0)

Code Example

from transformers import T5GemmaForConditionalGeneration
mdl = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
e = mdl.get_input_embeddings()
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
e = mdl.resize_token_embeddings(e.num_embeddings + 1)
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.5.1
Platform: Linux-6.6.113+-x86_64-with-glibc2.35
Python version: 3.12.13
Huggingface_hub version: 1.9.2
Safetensors version: 0.7.0
Accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.10.0+cpu (NA)
Using distributed or parallel set-up in script?: <fill in>

(Google Colaboratory)

Who can help?

@zucchini-nlp

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Quick reproduce:

from transformers import T5GemmaForConditionalGeneration
mdl = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
e = mdl.get_input_embeddings()
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
e = mdl.resize_token_embeddings(e.num_embeddings + 1)
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features

Expected behavior

All e.num_embeddings f.num_embeddings and g.out_features should be increased to 256001.

extent analysis

TL;DR

The issue can be resolved by ensuring that the resize_token_embeddings method updates the embeddings of the decoder's embed tokens and the output embeddings accordingly.

Guidance

The provided reproduction code snippet suggests that the resize_token_embeddings method only updates the input embeddings, but not the decoder's embed tokens and the output embeddings.
To fix this, you need to update the resize_token_embeddings method to also resize the decoder's embed tokens and the output embeddings.
You can verify the fix by checking the values of e.num_embeddings, f.num_embeddings, and g.out_features after calling resize_token_embeddings.
The expected behavior is that all these values should be increased to the new size, in this case, 256001.

Example

from transformers import T5GemmaForConditionalGeneration

mdl = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
e = mdl.get_input_embeddings()
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()

# Resize input embeddings
mdl.resize_token_embeddings(e.num_embeddings + 1)

# Manually update decoder's embed tokens and output embeddings
f = mdl.model.decoder.embed_tokens = torch.nn.Embedding(e.num_embeddings, f.embedding_dim)
g = mdl.model.lm_head = torch.nn.Linear(f.embedding_dim, e.num_embeddings)

print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features

Notes

The provided code snippet is specific to the T5GemmaForConditionalGeneration model, and the fix may not apply to other models.

Recommendation

Apply workaround: The recommended solution is to manually update the decoder's embed tokens and the output embeddings after calling resize_token_embeddings, as shown in the example code snippet. This ensures that all embeddings are updated correctly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

All e.num_embeddings f.num_embeddings and g.out_features should be increased to 256001.

#API rate limit #retriever error #indexing error #inference speed #output truncation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix [t5gemma] resize_token_embeddings does not effect to decoder.embed_tokens [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #45324: Gemma4 resizing per layer inputs

Description (problem / solution / changelog)

What does this PR do?

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix [t5gemma] resize_token_embeddings does not effect to decoder.embed_tokens [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #45324: Gemma4 resizing per layer inputs

Description (problem / solution / changelog)

What does this PR do?

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING