transformers - ✅(Solved) Fix [Gemma 4] mm_token_type_ids required for text-only fine-tuning - should default to zeros [2 pull requests, 3 comments, 3 participants]

transformers2026-04-02 20:26:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45200•Fetched 2026-04-08 02:33:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3mentioned ×3subscribed ×3cross-referenced ×1

Error Message

Traceback:

Fix Action

Fix / Workaround

File ".../transformers/models/gemma4/modeling_gemma4.py", line 931, in forward raise ValueError("mm_token_type_ids is required as a model input when training") ValueError: mm_token_type_ids is required as a model input when training Workaround - adding mm_token_type_ids as zeros works:

PR fix notes

PR #45222: fix(gemma3, gemma4): default token_type_ids to zeros for text-only training

Repository: huggingface/transformers
Author: jashshah999
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/45222

Description (problem / solution / changelog)

Summary

When using Gemma 3 or Gemma 4 for text-only supervised fine-tuning (no images), the forward pass raises a ValueError because token_type_ids / mm_token_type_ids is not provided. This happens because AutoTokenizer does not produce these fields -- only the multimodal Processor does.

The fix defaults to all-zeros when token_type_ids / mm_token_type_ids is None during training, instead of raising. When all zeros, is_vision is entirely False, so the bidirectional vision mask branch is skipped and a standard causal mask is produced -- which is exactly correct for text-only input.

Changes

modeling_gemma4.py / modular_gemma4.py: default mm_token_type_ids to torch.zeros(...) instead of raising ValueError
modeling_gemma3.py / modular_gemma3.py: same fix for token_type_ids (same root cause)

Fixes #45200

Changed files

src/transformers/models/gemma3/modeling_gemma3.py (modified, +3/-1)
src/transformers/models/gemma3/modular_gemma3.py (modified, +3/-1)
src/transformers/models/gemma4/modeling_gemma4.py (modified, +3/-1)
src/transformers/models/gemma4/modular_gemma4.py (modified, +3/-1)

PR #45454: Gemma4 training with text-only samples

Repository: huggingface/transformers
Author: zucchini-nlp
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45454

Description (problem / solution / changelog)

What does this PR do?

Fixes https://github.com/huggingface/transformers/issues/45200

As per title, this error was actually needed only in PG. Other models don't have such prefix/suffix separation when training

Changed files

src/transformers/models/gemma3/modeling_gemma3.py (modified, +9/-21)
src/transformers/models/gemma3/modular_gemma3.py (modified, +9/-21)
src/transformers/models/gemma4/modeling_gemma4.py (modified, +9/-21)
src/transformers/models/gemma4/modular_gemma4.py (modified, +9/-21)
src/transformers/models/git/modeling_git.py (modified, +6/-20)
tests/models/gemma3/test_modeling_gemma3.py (modified, +18/-0)
tests/models/gemma4/test_modeling_gemma4.py (modified, +21/-0)

RAW_BUFFERClick to expand / collapse

System Info

transformers: 5.5.0.dev0 (installed from source) torch: 2.8.0+cu128 trl: 1.0.0 peft: 0.18.2.dev0 Python: 3.12 OS: Linux (RunPod, Ubuntu 24.04) GPU: NVIDIA B200 (192GB)

Who can help?

@zucchini-nlp @ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Steps to reproduce the behavior:

Load google/gemma-4-31B with 4-bit quantization
Tokenize any text input
Call model.train() and run a forward pass with labels

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-31B", quantization_config=bnb,
    device_map="auto", torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-31B")

inputs = tokenizer("What is CMMC?", return_tensors="pt").to("cuda")
model.train()
outputs = model(**inputs, labels=inputs["input_ids"])
# ValueError: `mm_token_type_ids` is required as a model input when training

Traceback:


File ".../transformers/models/gemma4/modeling_gemma4.py", line 931, in forward
    raise ValueError("`mm_token_type_ids` is required as a model input when training")
ValueError: `mm_token_type_ids` is required as a model input when training
Workaround - adding mm_token_type_ids as zeros works:


inputs["token_type_ids"] = torch.zeros_like(inputs["input_ids"])
inputs["mm_token_type_ids"] = torch.zeros_like(inputs["input_ids"])
model.train()
outputs = model(**inputs, labels=inputs["input_ids"])  # Works
For SFT with TRL, this requires a custom data collator and remove_unused_columns=False.

### Expected behavior

For text-only training (no images or audio), mm_token_type_ids should default to zeros when not provided, rather than raising a ValueError.

Gemma 3 had a similar pattern with token_type_ids. Gemma 4 adds mm_token_type_ids on top of that. Both are required even for text-only fine-tuning.

Suggestion: either (1) default to zeros when not provided, (2) auto-generate in the tokenizer, or (3) document as required in the model card.

extent analysis

TL;DR

To fix the issue, provide mm_token_type_ids as an input to the model when training, either by defaulting to zeros or auto-generating in the tokenizer.

Guidance

When training the model, ensure that mm_token_type_ids is included in the input dictionary, as it is required for training.
To mitigate the issue, you can add mm_token_type_ids as zeros to the input dictionary, as shown in the provided workaround.
Consider modifying the tokenizer to auto-generate mm_token_type_ids or defaulting to zeros when not provided.
When using SFT with TRL, create a custom data collator and set remove_unused_columns=False to accommodate the required input.

Example

inputs["mm_token_type_ids"] = torch.zeros_like(inputs["input_ids"])

Notes

The issue is specific to the google/gemma-4-31B model with 4-bit quantization, and the solution may not apply to other models or configurations.

Recommendation

Apply workaround by adding mm_token_type_ids as zeros to the input dictionary, as this provides a temporary solution until the model or tokenizer is updated to handle this input by default.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

For text-only training (no images or audio), mm_token_type_ids should default to zeros when not provided, rather than raising a ValueError.

Gemma 3 had a similar pattern with token_type_ids. Gemma 4 adds mm_token_type_ids on top of that. Both are required even for text-only fine-tuning.

Suggestion: either (1) default to zeros when not provided, (2) auto-generate in the tokenizer, or (3) document as required in the model card.

#docker error #permission error #memory optimization #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix [Gemma 4] mm_token_type_ids required for text-only fine-tuning - should default to zeros [2 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

PR fix notes

PR #45222: fix(gemma3, gemma4): default token_type_ids to zeros for text-only training

Description (problem / solution / changelog)

Summary

Changes

Changed files

PR #45454: Gemma4 training with text-only samples

Description (problem / solution / changelog)

What does this PR do?

Changed files

System Info

Who can help?

Information

Tasks

Reproduction

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING