transformers - ✅(Solved) Fix [t5gemma] resize_token_embeddings does not effect to decoder.embed_tokens [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

PR fix notes

PR #45324: Gemma4 resizing per layer inputs

Description (problem / solution / changelog)

What does this PR do?

Fixes https://github.com/huggingface/transformers/issues/45276 and https://github.com/huggingface/transformers/issues/45335

In gemma4 per-layer inputs have to be resized as long as they aren't part of soft multimodal tokens

Repro for T5 gemma:

from transformers import T5GemmaForConditionalGeneration

model = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
encoder = model.resize_token_embeddings(model.vocab_size + 10)
decoder = model.model.decoder.embed_tokens
head = model.get_output_embeddings()
print(encoder.weight.shape, decoder.weight.shape, head.weight.shape)
# LM head resize is reverted back because of tying. Decoder is never resized
>>> torch.Size([256010, 512]) torch.Size([256000, 512]) torch.Size([256000, 512])

Gemma3n has soft mm tokens, and the current state of 3N is not good. I see unused vocab entries in mm_projection 😢 and if when apply simply resizing to per-layer-input, we'll get even more unused entries

Could be done in the correct way if we filter out mm-tokens, but I'd prefer to leave it for now

Changed files

  • src/transformers/models/gemma3/modeling_gemma3.py (modified, +0/-6)
  • src/transformers/models/gemma3n/configuration_gemma3n.py (modified, +0/-1)
  • src/transformers/models/gemma3n/modeling_gemma3n.py (modified, +55/-6)
  • src/transformers/models/gemma3n/modular_gemma3n.py (modified, +61/-1)
  • src/transformers/models/gemma4/modeling_gemma4.py (modified, +68/-12)
  • src/transformers/models/gemma4/modular_gemma4.py (modified, +22/-10)
  • src/transformers/models/paligemma/modeling_paligemma.py (modified, +0/-6)
  • src/transformers/models/t5gemma/modeling_t5gemma.py (modified, +6/-0)
  • src/transformers/models/t5gemma/modular_t5gemma.py (modified, +6/-0)
  • tests/models/blip/test_modeling_blip.py (modified, +2/-2)
  • tests/models/colmodernvbert/test_modeling_colmodernvbert.py (modified, +6/-4)
  • tests/models/lfm2_vl/test_modeling_lfm2_vl.py (modified, +1/-1)
  • tests/models/qwen3_vl/test_modeling_qwen3_vl.py (modified, +2/-2)
  • tests/models/qwen3_vl_moe/test_modeling_qwen3_vl_moe.py (modified, +2/-2)
  • tests/test_modeling_common.py (modified, +3/-0)

Code Example

from transformers import T5GemmaForConditionalGeneration
mdl = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
e = mdl.get_input_embeddings()
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
e = mdl.resize_token_embeddings(e.num_embeddings + 1)
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.5.1
  • Platform: Linux-6.6.113+-x86_64-with-glibc2.35
  • Python version: 3.12.13
  • Huggingface_hub version: 1.9.2
  • Safetensors version: 0.7.0
  • Accelerate version: 1.13.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.10.0+cpu (NA)
  • Using distributed or parallel set-up in script?: <fill in>

(Google Colaboratory)

Who can help?

@zucchini-nlp

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Quick reproduce:

from transformers import T5GemmaForConditionalGeneration
mdl = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
e = mdl.get_input_embeddings()
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
e = mdl.resize_token_embeddings(e.num_embeddings + 1)
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features

Expected behavior

All e.num_embeddings f.num_embeddings and g.out_features should be increased to 256001.

extent analysis

TL;DR

The issue can be resolved by ensuring that the resize_token_embeddings method updates the embeddings of the decoder's embed tokens and the output embeddings accordingly.

Guidance

  • The provided reproduction code snippet suggests that the resize_token_embeddings method only updates the input embeddings, but not the decoder's embed tokens and the output embeddings.
  • To fix this, you need to update the resize_token_embeddings method to also resize the decoder's embed tokens and the output embeddings.
  • You can verify the fix by checking the values of e.num_embeddings, f.num_embeddings, and g.out_features after calling resize_token_embeddings.
  • The expected behavior is that all these values should be increased to the new size, in this case, 256001.

Example

from transformers import T5GemmaForConditionalGeneration

mdl = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
e = mdl.get_input_embeddings()
f = mdl.model.decoder.embed_tokens
g = mdl.get_output_embeddings()

# Resize input embeddings
mdl.resize_token_embeddings(e.num_embeddings + 1)

# Manually update decoder's embed tokens and output embeddings
f = mdl.model.decoder.embed_tokens = torch.nn.Embedding(e.num_embeddings, f.embedding_dim)
g = mdl.model.lm_head = torch.nn.Linear(f.embedding_dim, e.num_embeddings)

print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features

Notes

The provided code snippet is specific to the T5GemmaForConditionalGeneration model, and the fix may not apply to other models.

Recommendation

Apply workaround: The recommended solution is to manually update the decoder's embed tokens and the output embeddings after calling resize_token_embeddings, as shown in the example code snippet. This ensures that all embeddings are updated correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

All e.num_embeddings f.num_embeddings and g.out_features should be increased to 256001.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix [t5gemma] resize_token_embeddings does not effect to decoder.embed_tokens [1 pull requests]