transformers - ✅(Solved) Fix [gemma4] resize_token_embeddings does not effect to embed_tokens_per_layer or output_embeddings [1 pull requests, 4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45276Fetched 2026-04-08 03:00:51
View on GitHub
Comments
4
Participants
2
Timeline
12
Reactions
0
Timeline (top)
commented ×4mentioned ×3subscribed ×3cross-referenced ×1

PR fix notes

PR #45324: Gemma4 resizing per layer inputs

Description (problem / solution / changelog)

What does this PR do?

Fixes https://github.com/huggingface/transformers/issues/45276 and https://github.com/huggingface/transformers/issues/45335

In gemma4 per-layer inputs have to be resized as long as they aren't part of soft multimodal tokens

Repro for T5 gemma:

from transformers import T5GemmaForConditionalGeneration

model = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
encoder = model.resize_token_embeddings(model.vocab_size + 10)
decoder = model.model.decoder.embed_tokens
head = model.get_output_embeddings()
print(encoder.weight.shape, decoder.weight.shape, head.weight.shape)
# LM head resize is reverted back because of tying. Decoder is never resized
>>> torch.Size([256010, 512]) torch.Size([256000, 512]) torch.Size([256000, 512])

Gemma3n has soft mm tokens, and the current state of 3N is not good. I see unused vocab entries in mm_projection 😢 and if when apply simply resizing to per-layer-input, we'll get even more unused entries

Could be done in the correct way if we filter out mm-tokens, but I'd prefer to leave it for now

Changed files

  • src/transformers/modeling_utils.py (modified, +2/-0)
  • src/transformers/models/gemma3/modeling_gemma3.py (modified, +0/-6)
  • src/transformers/models/gemma3n/configuration_gemma3n.py (modified, +0/-1)
  • src/transformers/models/gemma3n/modeling_gemma3n.py (modified, +55/-6)
  • src/transformers/models/gemma3n/modular_gemma3n.py (modified, +61/-1)
  • src/transformers/models/gemma4/modeling_gemma4.py (modified, +68/-12)
  • src/transformers/models/gemma4/modular_gemma4.py (modified, +22/-10)
  • src/transformers/models/paligemma/modeling_paligemma.py (modified, +0/-6)
  • src/transformers/models/t5gemma/modeling_t5gemma.py (modified, +6/-0)
  • src/transformers/models/t5gemma/modular_t5gemma.py (modified, +6/-0)
  • tests/models/blip/test_modeling_blip.py (modified, +2/-2)
  • tests/models/colmodernvbert/test_modeling_colmodernvbert.py (modified, +6/-4)
  • tests/models/lfm2_vl/test_modeling_lfm2_vl.py (modified, +1/-1)
  • tests/models/qwen3_vl/test_modeling_qwen3_vl.py (modified, +2/-2)
  • tests/models/qwen3_vl_moe/test_modeling_qwen3_vl_moe.py (modified, +2/-2)
  • tests/test_modeling_common.py (modified, +3/-0)

Code Example

from transformers import Gemma4ForConditionalGeneration
mdl = Gemma4ForConditionalGeneration.from_pretrained("google/gemma-4-E2B-it")
e = mdl.get_input_embeddings()
f = mdl.model.language_model.embed_tokens_per_layer
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
e = mdl.resize_token_embeddings(e.num_embeddings + 1)
f = mdl.model.language_model.embed_tokens_per_layer
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.5.0
  • Platform: Linux-6.6.113+-x86_64-with-glibc2.35
  • Python version: 3.12.13
  • Huggingface_hub version: 1.8.0
  • Safetensors version: 0.7.0
  • Accelerate version: 1.13.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.10.0+cu128 (CUDA)
  • Using distributed or parallel set-up in script?: <fill in>
  • Using GPU in script?: <fill in>
  • GPU type: Tesla T4

(Google Colaboratory GPU)

Who can help?

@zucchini-nlp @Cyrilvallez

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Reproduce the behavior:

from transformers import Gemma4ForConditionalGeneration
mdl = Gemma4ForConditionalGeneration.from_pretrained("google/gemma-4-E2B-it")
e = mdl.get_input_embeddings()
f = mdl.model.language_model.embed_tokens_per_layer
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
e = mdl.resize_token_embeddings(e.num_embeddings + 1)
f = mdl.model.language_model.embed_tokens_per_layer
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features

Expected behavior

All e.num_embeddings, f.num_embeddings and g.out_features should be increased to 262145.

extent analysis

TL;DR

The issue can be resolved by ensuring that the resize_token_embeddings method updates the embeddings consistently across the model.

Guidance

  • Verify that the resize_token_embeddings method is correctly updating the input embeddings by checking the num_embeddings attribute of e after resizing.
  • Check if the embed_tokens_per_layer and get_output_embeddings methods are returning the expected updated embeddings after resizing.
  • Investigate if there are any inconsistencies in the embedding sizes between the input, per-layer, and output embeddings.
  • Consider checking the documentation of the Gemma4ForConditionalGeneration model to see if there are any specific requirements or limitations for resizing token embeddings.

Example

# After resizing the input embeddings, manually update the per-layer embeddings
f = mdl.model.language_model.embed_tokens_per_layer
f.num_embeddings = e.num_embeddings

Note: This example is speculative and may not be the correct solution, as the actual implementation details of the Gemma4ForConditionalGeneration model are not provided.

Notes

The provided code snippet suggests that the issue is related to the inconsistent updating of embeddings after resizing. However, without more information about the model's internal implementation, it is difficult to provide a definitive solution.

Recommendation

Apply workaround: Manually update the per-layer embeddings after resizing the input embeddings, as shown in the example code snippet. This may help ensure consistency across the model's embeddings.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

All e.num_embeddings, f.num_embeddings and g.out_features should be increased to 262145.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix [gemma4] resize_token_embeddings does not effect to embed_tokens_per_layer or output_embeddings [1 pull requests, 4 comments, 2 participants]