vllm - ✅(Solved) Fix [Bug]: Gemma4 vision encoder crashes with ValueError: Expected hidden_size to be 5376, but found: 72 [1 pull requests, 1 participants]

vllm2026-04-06 06:59:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39061•Fetched 2026-04-08 02:52:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ohsono

Participants

ohsono

Timeline (top)

cross-referenced ×1referenced ×1

Error Message

ValueError: Expected hidden_size to be 5376, but found: 72

Root Cause

When vLLM's TransformersModelBase._recursive_replace walks the model graph and replaces norm modules, it calls:

# base.py:442
elif child_module.__class__.__name__.endswith("RMSNorm"):
    new_module = replace_rms_norm_class(
        child_module, self.text_config.hidden_size  # ← always 5376 (LM hidden size)
    )

Inside replace_rms_norm_class (transformers/utils.py), the hidden_size argument is only overridden when the norm module has a weight parameter:

weight_meta = getattr(rms_norm, "weight", None)
if weight_meta is not None:
    kwargs["hidden_size"] = weight_meta.size(0)

Gemma4's vision encoder v_norm is Gemma4RMSNorm(head_dim=72, with_scale=False). With with_scale=False, no weight is registered, so hidden_size stays at 5376. The resulting RMSNorm(hidden_size=5376, has_weight=False) then validates x.shape[-1] == 5376 for every forward call, but the vision value states have shape [..., 72].

Fix Action

Fix

Fix 1 (primary) — vllm/model_executor/layers/layernorm.py: Only validate hidden_size when weight is not None. A weightless RMSNorm has no constraint on the input dimension.

# Before
if x.shape[-1] != hidden_size:
    raise ValueError(...)

# After
if weight is not None and x.shape[-1] != hidden_size:
    raise ValueError(...)

Fix 2 (defensive) — vllm/model_executor/models/transformers/utils.py: In replace_rms_norm_class, for weightless norms, try to infer the correct dimension from module attributes (dim, hidden_size, normalized_shape) before falling back to the text model's hidden_size.

else:
    # No weight: try to infer the norm's dimension from other attributes
    inferred = getattr_iter(
        rms_norm, ("dim", "hidden_size", "normalized_shape"), None
    )
    if inferred is not None:
        if isinstance(inferred, (list, tuple)):
            inferred = inferred[-1]
        kwargs["hidden_size"] = int(inferred)

Note: Fix 2 alone does not fix the Gemma4 case because Gemma4RMSNorm with with_scale=False does not store dim as an attribute. Fix 1 is necessary.

PR fix notes

PR #39073: Fix RMSNorm hidden_size validation crash for weightless norms

Repository: vllm-project/vllm
Author: Chessing234
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39073

Description (problem / solution / changelog)

Summary

Fixes ValueError: Expected hidden_size to be 5376, but found: 72 when running Gemma4 vision models
When replace_rms_norm_class replaces RMSNorm modules, it passes the LM hidden_size even for vision encoder norms with a different dimension. For norms with with_scale=False (like Gemma4's v_norm), no weight is registered, so the hidden_size correction code (which reads weight.shape) is skipped, leaving the wrong value. The forward_static validation then raises a ValueError.
The fix skips the hidden_size validation when weight is None, since a weightless RMSNorm just computes x / sqrt(mean(x^2) + eps) and does not depend on hidden_size.

Fixes #39061

Why this is not a duplicate

No open PRs address issue #39061.

Test plan

Verify Gemma4 vision encoder no longer crashes on forward pass
Verify standard RMSNorm (with weight) still validates hidden_size correctly
Run pytest tests/models/multimodal/ -v -k gemma if Gemma4 tests exist

AI Assistance

This PR was created with AI assistance (Claude). All changes have been reviewed.

🤖 Generated with Claude Code

Co-authored-by: Claude Opus 4.6 (1M context) [email protected]

Changed files

vllm/model_executor/layers/layernorm.py (modified, +31/-5)

Code Example

ValueError: Expected hidden_size to be 5376, but found: 72

---

File ".../vllm/v1/worker/gpu_model_runner.py", line 5761, in profile_run
    dummy_encoder_outputs = self.model.embed_multimodal(...)
File ".../vllm/model_executor/models/transformers/multimodal.py", line 350, in embed_multimodal
    vision_embeddings = self.model.get_image_features(...)
File ".../transformers/models/gemma4/modeling_gemma4.py", line 905, in forward
    value_states = self.v_norm(value_states)
File ".../vllm/model_executor/layers/layernorm.py", line 241, in forward_static
    raise ValueError(
ValueError: Expected hidden_size to be 5376, but found: 72

---

# base.py:442
elif child_module.__class__.__name__.endswith("RMSNorm"):
    new_module = replace_rms_norm_class(
        child_module, self.text_config.hidden_size  # ← always 5376 (LM hidden size)
    )

---

weight_meta = getattr(rms_norm, "weight", None)
if weight_meta is not None:
    kwargs["hidden_size"] = weight_meta.size(0)

---

# Before
if x.shape[-1] != hidden_size:
    raise ValueError(...)

# After
if weight is not None and x.shape[-1] != hidden_size:
    raise ValueError(...)

---

else:
    # No weight: try to infer the norm's dimension from other attributes
    inferred = getattr_iter(
        rms_norm, ("dim", "hidden_size", "normalized_shape"), None
    )
    if inferred is not None:
        if isinstance(inferred, (list, tuple)):
            inferred = inferred[-1]
        kwargs["hidden_size"] = int(inferred)

---

vllm serve google/gemma-4-27b-it --trust-remote-code

RAW_BUFFERClick to expand / collapse

Your current environment

vLLM version: main (commit approx. 2025-04-05)
Model: google/gemma-4-27b-it (or any Gemma4 multimodal model)
Python: 3.12
CUDA: 13.x
Transformers: installed via pip in .vllm venv

🐛 Describe the bug

Starting vLLM with a Gemma4 multimodal model (e.g. google/gemma-4-27b-it) fails during engine core initialization with:

ValueError: Expected hidden_size to be 5376, but found: 72

Full traceback (abbreviated):

File ".../vllm/v1/worker/gpu_model_runner.py", line 5761, in profile_run
    dummy_encoder_outputs = self.model.embed_multimodal(...)
File ".../vllm/model_executor/models/transformers/multimodal.py", line 350, in embed_multimodal
    vision_embeddings = self.model.get_image_features(...)
File ".../transformers/models/gemma4/modeling_gemma4.py", line 905, in forward
    value_states = self.v_norm(value_states)
File ".../vllm/model_executor/layers/layernorm.py", line 241, in forward_static
    raise ValueError(
ValueError: Expected hidden_size to be 5376, but found: 72

Root Cause

When vLLM's TransformersModelBase._recursive_replace walks the model graph and replaces norm modules, it calls:

# base.py:442
elif child_module.__class__.__name__.endswith("RMSNorm"):
    new_module = replace_rms_norm_class(
        child_module, self.text_config.hidden_size  # ← always 5376 (LM hidden size)
    )

Inside replace_rms_norm_class (transformers/utils.py), the hidden_size argument is only overridden when the norm module has a weight parameter:

weight_meta = getattr(rms_norm, "weight", None)
if weight_meta is not None:
    kwargs["hidden_size"] = weight_meta.size(0)

Fix

Fix 1 (primary) — vllm/model_executor/layers/layernorm.py: Only validate hidden_size when weight is not None. A weightless RMSNorm has no constraint on the input dimension.

# Before
if x.shape[-1] != hidden_size:
    raise ValueError(...)

# After
if weight is not None and x.shape[-1] != hidden_size:
    raise ValueError(...)

else:
    # No weight: try to infer the norm's dimension from other attributes
    inferred = getattr_iter(
        rms_norm, ("dim", "hidden_size", "normalized_shape"), None
    )
    if inferred is not None:
        if isinstance(inferred, (list, tuple)):
            inferred = inferred[-1]
        kwargs["hidden_size"] = int(inferred)

Note: Fix 2 alone does not fix the Gemma4 case because Gemma4RMSNorm with with_scale=False does not store dim as an attribute. Fix 1 is necessary.

How to Reproduce

vllm serve google/gemma-4-27b-it --trust-remote-code

Engine crashes before serving any requests.

Expected behavior

Engine should start successfully and serve Gemma4 multimodal requests.

extent analysis

TL;DR

Apply the primary fix by modifying the vllm/model_executor/layers/layernorm.py file to only validate hidden_size when weight is not None.

Guidance

Identify the vllm/model_executor/layers/layernorm.py file and locate the line where the ValueError is raised.
Modify the condition to check if weight is not None before validating hidden_size, as shown in the primary fix.
Alternatively, consider applying the defensive fix in vllm/model_executor/models/transformers/utils.py to infer the correct dimension for weightless norms.
Verify the fix by running the command vllm serve google/gemma-4-27b-it --trust-remote-code and checking if the engine starts successfully.

Example

# Modified code in vllm/model_executor/layers/layernorm.py
if weight is not None and x.shape[-1] != hidden_size:
    raise ValueError(f"Expected hidden_size to be {hidden_size}, but found: {x.shape[-1]}")

Notes

The primary fix is necessary to resolve the issue, while the defensive fix provides additional robustness. The fixes assume that the weight attribute is a reliable indicator of whether the norm module has a constraint on the input dimension.

Recommendation

Apply the primary fix, as it directly addresses the root cause of the issue. The defensive fix can be applied additionally to provide extra robustness.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Engine should start successfully and serve Gemma4 multimodal requests.

#authentication setup #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Bug]: Gemma4 vision encoder crashes with ValueError: Expected hidden_size to be 5376, but found: 72 [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

PR fix notes

PR #39073: Fix RMSNorm hidden_size validation crash for weightless norms

Description (problem / solution / changelog)

Summary

Why this is not a duplicate

Test plan

AI Assistance

Changed files

Code Example

Your current environment

🐛 Describe the bug

Root Cause

Fix

How to Reproduce

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING