transformers - ✅(Solved) Fix [Gemma 4] mm_token_type_ids required for text-only fine-tuning - should default to zeros [2 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45200Fetched 2026-04-08 02:33:17
View on GitHub
Comments
3
Participants
3
Timeline
12
Reactions
0
Timeline (top)
commented ×3mentioned ×3subscribed ×3cross-referenced ×1

Error Message

Traceback:

Fix Action

Fix / Workaround

File ".../transformers/models/gemma4/modeling_gemma4.py", line 931, in forward raise ValueError("mm_token_type_ids is required as a model input when training") ValueError: mm_token_type_ids is required as a model input when training Workaround - adding mm_token_type_ids as zeros works:

PR fix notes

PR #45222: fix(gemma3, gemma4): default token_type_ids to zeros for text-only training

Description (problem / solution / changelog)

Summary

When using Gemma 3 or Gemma 4 for text-only supervised fine-tuning (no images), the forward pass raises a ValueError because token_type_ids / mm_token_type_ids is not provided. This happens because AutoTokenizer does not produce these fields -- only the multimodal Processor does.

The fix defaults to all-zeros when token_type_ids / mm_token_type_ids is None during training, instead of raising. When all zeros, is_vision is entirely False, so the bidirectional vision mask branch is skipped and a standard causal mask is produced -- which is exactly correct for text-only input.

Changes

  • modeling_gemma4.py / modular_gemma4.py: default mm_token_type_ids to torch.zeros(...) instead of raising ValueError
  • modeling_gemma3.py / modular_gemma3.py: same fix for token_type_ids (same root cause)

Fixes #45200

Changed files

  • src/transformers/models/gemma3/modeling_gemma3.py (modified, +3/-1)
  • src/transformers/models/gemma3/modular_gemma3.py (modified, +3/-1)
  • src/transformers/models/gemma4/modeling_gemma4.py (modified, +3/-1)
  • src/transformers/models/gemma4/modular_gemma4.py (modified, +3/-1)

PR #45454: Gemma4 training with text-only samples

Description (problem / solution / changelog)

What does this PR do?

Fixes https://github.com/huggingface/transformers/issues/45200

As per title, this error was actually needed only in PG. Other models don't have such prefix/suffix separation when training

Changed files

  • src/transformers/models/gemma3/modeling_gemma3.py (modified, +9/-21)
  • src/transformers/models/gemma3/modular_gemma3.py (modified, +9/-21)
  • src/transformers/models/gemma4/modeling_gemma4.py (modified, +9/-21)
  • src/transformers/models/gemma4/modular_gemma4.py (modified, +9/-21)
  • src/transformers/models/git/modeling_git.py (modified, +6/-20)
  • tests/models/gemma3/test_modeling_gemma3.py (modified, +18/-0)
  • tests/models/gemma4/test_modeling_gemma4.py (modified, +21/-0)
RAW_BUFFERClick to expand / collapse

System Info

transformers: 5.5.0.dev0 (installed from source) torch: 2.8.0+cu128 trl: 1.0.0 peft: 0.18.2.dev0 Python: 3.12 OS: Linux (RunPod, Ubuntu 24.04) GPU: NVIDIA B200 (192GB)

Who can help?

@zucchini-nlp @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Steps to reproduce the behavior:

  1. Load google/gemma-4-31B with 4-bit quantization
  2. Tokenize any text input
  3. Call model.train() and run a forward pass with labels
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-31B", quantization_config=bnb,
    device_map="auto", torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-31B")

inputs = tokenizer("What is CMMC?", return_tensors="pt").to("cuda")
model.train()
outputs = model(**inputs, labels=inputs["input_ids"])
# ValueError: `mm_token_type_ids` is required as a model input when training

Traceback:


File ".../transformers/models/gemma4/modeling_gemma4.py", line 931, in forward
    raise ValueError("`mm_token_type_ids` is required as a model input when training")
ValueError: `mm_token_type_ids` is required as a model input when training
Workaround - adding mm_token_type_ids as zeros works:


inputs["token_type_ids"] = torch.zeros_like(inputs["input_ids"])
inputs["mm_token_type_ids"] = torch.zeros_like(inputs["input_ids"])
model.train()
outputs = model(**inputs, labels=inputs["input_ids"])  # Works
For SFT with TRL, this requires a custom data collator and remove_unused_columns=False.

### Expected behavior

For text-only training (no images or audio), mm_token_type_ids should default to zeros when not provided, rather than raising a ValueError.

Gemma 3 had a similar pattern with token_type_ids. Gemma 4 adds mm_token_type_ids on top of that. Both are required even for text-only fine-tuning.

Suggestion: either (1) default to zeros when not provided, (2) auto-generate in the tokenizer, or (3) document as required in the model card.

extent analysis

TL;DR

To fix the issue, provide mm_token_type_ids as an input to the model when training, either by defaulting to zeros or auto-generating in the tokenizer.

Guidance

  • When training the model, ensure that mm_token_type_ids is included in the input dictionary, as it is required for training.
  • To mitigate the issue, you can add mm_token_type_ids as zeros to the input dictionary, as shown in the provided workaround.
  • Consider modifying the tokenizer to auto-generate mm_token_type_ids or defaulting to zeros when not provided.
  • When using SFT with TRL, create a custom data collator and set remove_unused_columns=False to accommodate the required input.

Example

inputs["mm_token_type_ids"] = torch.zeros_like(inputs["input_ids"])

Notes

The issue is specific to the google/gemma-4-31B model with 4-bit quantization, and the solution may not apply to other models or configurations.

Recommendation

Apply workaround by adding mm_token_type_ids as zeros to the input dictionary, as this provides a temporary solution until the model or tokenizer is updated to handle this input by default.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For text-only training (no images or audio), mm_token_type_ids should default to zeros when not provided, rather than raising a ValueError.

Gemma 3 had a similar pattern with token_type_ids. Gemma 4 adds mm_token_type_ids on top of that. Both are required even for text-only fine-tuning.

Suggestion: either (1) default to zeros when not provided, (2) auto-generate in the tokenizer, or (3) document as required in the model card.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING