transformers - ✅(Solved) Fix [gemma4] resize_token_embeddings does not effect to embed_tokens_per_layer or output_embeddings [1 pull requests, 4 comments, 2 participants]

Q: Expected behavior

All `e.num_embeddings`, `f.num_embeddings` and `g.out_features` should be increased to 262145.

transformers2026-04-07 02:55:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45276•Fetched 2026-04-08 03:00:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

KoichiYasuoka

Participants

KoichiYasuoka

zucchini-nlp

Timeline (top)

commented ×4mentioned ×3subscribed ×3cross-referenced ×1

PR fix notes

PR #45324: Gemma4 resizing per layer inputs

Repository: huggingface/transformers
Author: zucchini-nlp
State: closed | merged: True
Link: https://github.com/huggingface/transformers/pull/45324

Description (problem / solution / changelog)

What does this PR do?

Fixes https://github.com/huggingface/transformers/issues/45276 and https://github.com/huggingface/transformers/issues/45335

In gemma4 per-layer inputs have to be resized as long as they aren't part of soft multimodal tokens

Repro for T5 gemma:

from transformers import T5GemmaForConditionalGeneration

model = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
encoder = model.resize_token_embeddings(model.vocab_size + 10)
decoder = model.model.decoder.embed_tokens
head = model.get_output_embeddings()
print(encoder.weight.shape, decoder.weight.shape, head.weight.shape)
# LM head resize is reverted back because of tying. Decoder is never resized
>>> torch.Size([256010, 512]) torch.Size([256000, 512]) torch.Size([256000, 512])

Gemma3n has soft mm tokens, and the current state of 3N is not good. I see unused vocab entries in mm_projection 😢 and if when apply simply resizing to per-layer-input, we'll get even more unused entries

Could be done in the correct way if we filter out mm-tokens, but I'd prefer to leave it for now

Changed files

src/transformers/modeling_utils.py (modified, +2/-0)
src/transformers/models/gemma3/modeling_gemma3.py (modified, +0/-6)
src/transformers/models/gemma3n/configuration_gemma3n.py (modified, +0/-1)
src/transformers/models/gemma3n/modeling_gemma3n.py (modified, +55/-6)
src/transformers/models/gemma3n/modular_gemma3n.py (modified, +61/-1)
src/transformers/models/gemma4/modeling_gemma4.py (modified, +68/-12)
src/transformers/models/gemma4/modular_gemma4.py (modified, +22/-10)
src/transformers/models/paligemma/modeling_paligemma.py (modified, +0/-6)
src/transformers/models/t5gemma/modeling_t5gemma.py (modified, +6/-0)
src/transformers/models/t5gemma/modular_t5gemma.py (modified, +6/-0)
tests/models/blip/test_modeling_blip.py (modified, +2/-2)
tests/models/colmodernvbert/test_modeling_colmodernvbert.py (modified, +6/-4)
tests/models/lfm2_vl/test_modeling_lfm2_vl.py (modified, +1/-1)
tests/models/qwen3_vl/test_modeling_qwen3_vl.py (modified, +2/-2)
tests/models/qwen3_vl_moe/test_modeling_qwen3_vl_moe.py (modified, +2/-2)
tests/test_modeling_common.py (modified, +3/-0)

Code Example

from transformers import Gemma4ForConditionalGeneration
mdl = Gemma4ForConditionalGeneration.from_pretrained("google/gemma-4-E2B-it")
e = mdl.get_input_embeddings()
f = mdl.model.language_model.embed_tokens_per_layer
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
e = mdl.resize_token_embeddings(e.num_embeddings + 1)
f = mdl.model.language_model.embed_tokens_per_layer
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.5.0
Platform: Linux-6.6.113+-x86_64-with-glibc2.35
Python version: 3.12.13
Huggingface_hub version: 1.8.0
Safetensors version: 0.7.0
Accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.10.0+cu128 (CUDA)
Using distributed or parallel set-up in script?: <fill in>
Using GPU in script?: <fill in>
GPU type: Tesla T4

(Google Colaboratory GPU)

Who can help?

@zucchini-nlp @Cyrilvallez

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Reproduce the behavior:

from transformers import Gemma4ForConditionalGeneration
mdl = Gemma4ForConditionalGeneration.from_pretrained("google/gemma-4-E2B-it")
e = mdl.get_input_embeddings()
f = mdl.model.language_model.embed_tokens_per_layer
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features
e = mdl.resize_token_embeddings(e.num_embeddings + 1)
f = mdl.model.language_model.embed_tokens_per_layer
g = mdl.get_output_embeddings()
print(e.num_embeddings, f.num_embeddings, g.out_features)
assert e.num_embeddings == f.num_embeddings == g.out_features

Expected behavior

All e.num_embeddings, f.num_embeddings and g.out_features should be increased to 262145.

extent analysis

TL;DR

The issue can be resolved by ensuring that the resize_token_embeddings method updates the embeddings consistently across the model.

Guidance

Verify that the resize_token_embeddings method is correctly updating the input embeddings by checking the num_embeddings attribute of e after resizing.
Check if the embed_tokens_per_layer and get_output_embeddings methods are returning the expected updated embeddings after resizing.
Investigate if there are any inconsistencies in the embedding sizes between the input, per-layer, and output embeddings.
Consider checking the documentation of the Gemma4ForConditionalGeneration model to see if there are any specific requirements or limitations for resizing token embeddings.

Example

# After resizing the input embeddings, manually update the per-layer embeddings
f = mdl.model.language_model.embed_tokens_per_layer
f.num_embeddings = e.num_embeddings

Note: This example is speculative and may not be the correct solution, as the actual implementation details of the Gemma4ForConditionalGeneration model are not provided.

Notes

The provided code snippet suggests that the issue is related to the inconsistent updating of embeddings after resizing. However, without more information about the model's internal implementation, it is difficult to provide a definitive solution.

Recommendation

Apply workaround: Manually update the per-layer embeddings after resizing the input embeddings, as shown in the example code snippet. This may help ensure consistency across the model's embeddings.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

All e.num_embeddings, f.num_embeddings and g.out_features should be increased to 262145.

#prompt formatting #chain error #conversation history #tool integration #LLM response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix [gemma4] resize_token_embeddings does not effect to embed_tokens_per_layer or output_embeddings [1 pull requests, 4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #45324: Gemma4 resizing per layer inputs

Description (problem / solution / changelog)

What does this PR do?

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix [gemma4] resize_token_embeddings does not effect to embed_tokens_per_layer or output_embeddings [1 pull requests, 4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #45324: Gemma4 resizing per layer inputs

Description (problem / solution / changelog)

What does this PR do?

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING