vllm - ✅(Solved) Fix [Bug]: Gemma4 vision encoder crashes with ValueError: Expected hidden_size to be 5376, but found: 72 [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39061Fetched 2026-04-08 02:52:42
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1referenced ×1

Error Message

ValueError: Expected hidden_size to be 5376, but found: 72

Root Cause

When vLLM's TransformersModelBase._recursive_replace walks the model graph and replaces norm modules, it calls:

# base.py:442
elif child_module.__class__.__name__.endswith("RMSNorm"):
    new_module = replace_rms_norm_class(
        child_module, self.text_config.hidden_size  # ← always 5376 (LM hidden size)
    )

Inside replace_rms_norm_class (transformers/utils.py), the hidden_size argument is only overridden when the norm module has a weight parameter:

weight_meta = getattr(rms_norm, "weight", None)
if weight_meta is not None:
    kwargs["hidden_size"] = weight_meta.size(0)

Gemma4's vision encoder v_norm is Gemma4RMSNorm(head_dim=72, with_scale=False). With with_scale=False, no weight is registered, so hidden_size stays at 5376. The resulting RMSNorm(hidden_size=5376, has_weight=False) then validates x.shape[-1] == 5376 for every forward call, but the vision value states have shape [..., 72].

Fix Action

Fix

Fix 1 (primary)vllm/model_executor/layers/layernorm.py: Only validate hidden_size when weight is not None. A weightless RMSNorm has no constraint on the input dimension.

# Before
if x.shape[-1] != hidden_size:
    raise ValueError(...)

# After
if weight is not None and x.shape[-1] != hidden_size:
    raise ValueError(...)

Fix 2 (defensive)vllm/model_executor/models/transformers/utils.py: In replace_rms_norm_class, for weightless norms, try to infer the correct dimension from module attributes (dim, hidden_size, normalized_shape) before falling back to the text model's hidden_size.

else:
    # No weight: try to infer the norm's dimension from other attributes
    inferred = getattr_iter(
        rms_norm, ("dim", "hidden_size", "normalized_shape"), None
    )
    if inferred is not None:
        if isinstance(inferred, (list, tuple)):
            inferred = inferred[-1]
        kwargs["hidden_size"] = int(inferred)

Note: Fix 2 alone does not fix the Gemma4 case because Gemma4RMSNorm with with_scale=False does not store dim as an attribute. Fix 1 is necessary.

PR fix notes

PR #39073: Fix RMSNorm hidden_size validation crash for weightless norms

Description (problem / solution / changelog)

Summary

  • Fixes ValueError: Expected hidden_size to be 5376, but found: 72 when running Gemma4 vision models
  • When replace_rms_norm_class replaces RMSNorm modules, it passes the LM hidden_size even for vision encoder norms with a different dimension. For norms with with_scale=False (like Gemma4's v_norm), no weight is registered, so the hidden_size correction code (which reads weight.shape) is skipped, leaving the wrong value. The forward_static validation then raises a ValueError.
  • The fix skips the hidden_size validation when weight is None, since a weightless RMSNorm just computes x / sqrt(mean(x^2) + eps) and does not depend on hidden_size.

Fixes #39061

Why this is not a duplicate

  • No open PRs address issue #39061.

Test plan

  • Verify Gemma4 vision encoder no longer crashes on forward pass
  • Verify standard RMSNorm (with weight) still validates hidden_size correctly
  • Run pytest tests/models/multimodal/ -v -k gemma if Gemma4 tests exist

AI Assistance

This PR was created with AI assistance (Claude). All changes have been reviewed.

🤖 Generated with Claude Code

Co-authored-by: Claude Opus 4.6 (1M context) [email protected]

Changed files

  • vllm/model_executor/layers/layernorm.py (modified, +31/-5)

Code Example

ValueError: Expected hidden_size to be 5376, but found: 72

---

File ".../vllm/v1/worker/gpu_model_runner.py", line 5761, in profile_run
    dummy_encoder_outputs = self.model.embed_multimodal(...)
File ".../vllm/model_executor/models/transformers/multimodal.py", line 350, in embed_multimodal
    vision_embeddings = self.model.get_image_features(...)
File ".../transformers/models/gemma4/modeling_gemma4.py", line 905, in forward
    value_states = self.v_norm(value_states)
File ".../vllm/model_executor/layers/layernorm.py", line 241, in forward_static
    raise ValueError(
ValueError: Expected hidden_size to be 5376, but found: 72

---

# base.py:442
elif child_module.__class__.__name__.endswith("RMSNorm"):
    new_module = replace_rms_norm_class(
        child_module, self.text_config.hidden_size  # ← always 5376 (LM hidden size)
    )

---

weight_meta = getattr(rms_norm, "weight", None)
if weight_meta is not None:
    kwargs["hidden_size"] = weight_meta.size(0)

---

# Before
if x.shape[-1] != hidden_size:
    raise ValueError(...)

# After
if weight is not None and x.shape[-1] != hidden_size:
    raise ValueError(...)

---

else:
    # No weight: try to infer the norm's dimension from other attributes
    inferred = getattr_iter(
        rms_norm, ("dim", "hidden_size", "normalized_shape"), None
    )
    if inferred is not None:
        if isinstance(inferred, (list, tuple)):
            inferred = inferred[-1]
        kwargs["hidden_size"] = int(inferred)

---

vllm serve google/gemma-4-27b-it --trust-remote-code
RAW_BUFFERClick to expand / collapse

Your current environment

  • vLLM version: main (commit approx. 2025-04-05)
  • Model: google/gemma-4-27b-it (or any Gemma4 multimodal model)
  • Python: 3.12
  • CUDA: 13.x
  • Transformers: installed via pip in .vllm venv

🐛 Describe the bug

Starting vLLM with a Gemma4 multimodal model (e.g. google/gemma-4-27b-it) fails during engine core initialization with:

ValueError: Expected hidden_size to be 5376, but found: 72

Full traceback (abbreviated):

File ".../vllm/v1/worker/gpu_model_runner.py", line 5761, in profile_run
    dummy_encoder_outputs = self.model.embed_multimodal(...)
File ".../vllm/model_executor/models/transformers/multimodal.py", line 350, in embed_multimodal
    vision_embeddings = self.model.get_image_features(...)
File ".../transformers/models/gemma4/modeling_gemma4.py", line 905, in forward
    value_states = self.v_norm(value_states)
File ".../vllm/model_executor/layers/layernorm.py", line 241, in forward_static
    raise ValueError(
ValueError: Expected hidden_size to be 5376, but found: 72

Root Cause

When vLLM's TransformersModelBase._recursive_replace walks the model graph and replaces norm modules, it calls:

# base.py:442
elif child_module.__class__.__name__.endswith("RMSNorm"):
    new_module = replace_rms_norm_class(
        child_module, self.text_config.hidden_size  # ← always 5376 (LM hidden size)
    )

Inside replace_rms_norm_class (transformers/utils.py), the hidden_size argument is only overridden when the norm module has a weight parameter:

weight_meta = getattr(rms_norm, "weight", None)
if weight_meta is not None:
    kwargs["hidden_size"] = weight_meta.size(0)

Gemma4's vision encoder v_norm is Gemma4RMSNorm(head_dim=72, with_scale=False). With with_scale=False, no weight is registered, so hidden_size stays at 5376. The resulting RMSNorm(hidden_size=5376, has_weight=False) then validates x.shape[-1] == 5376 for every forward call, but the vision value states have shape [..., 72].

Fix

Fix 1 (primary)vllm/model_executor/layers/layernorm.py: Only validate hidden_size when weight is not None. A weightless RMSNorm has no constraint on the input dimension.

# Before
if x.shape[-1] != hidden_size:
    raise ValueError(...)

# After
if weight is not None and x.shape[-1] != hidden_size:
    raise ValueError(...)

Fix 2 (defensive)vllm/model_executor/models/transformers/utils.py: In replace_rms_norm_class, for weightless norms, try to infer the correct dimension from module attributes (dim, hidden_size, normalized_shape) before falling back to the text model's hidden_size.

else:
    # No weight: try to infer the norm's dimension from other attributes
    inferred = getattr_iter(
        rms_norm, ("dim", "hidden_size", "normalized_shape"), None
    )
    if inferred is not None:
        if isinstance(inferred, (list, tuple)):
            inferred = inferred[-1]
        kwargs["hidden_size"] = int(inferred)

Note: Fix 2 alone does not fix the Gemma4 case because Gemma4RMSNorm with with_scale=False does not store dim as an attribute. Fix 1 is necessary.

How to Reproduce

vllm serve google/gemma-4-27b-it --trust-remote-code

Engine crashes before serving any requests.

Expected behavior

Engine should start successfully and serve Gemma4 multimodal requests.

extent analysis

TL;DR

Apply the primary fix by modifying the vllm/model_executor/layers/layernorm.py file to only validate hidden_size when weight is not None.

Guidance

  • Identify the vllm/model_executor/layers/layernorm.py file and locate the line where the ValueError is raised.
  • Modify the condition to check if weight is not None before validating hidden_size, as shown in the primary fix.
  • Alternatively, consider applying the defensive fix in vllm/model_executor/models/transformers/utils.py to infer the correct dimension for weightless norms.
  • Verify the fix by running the command vllm serve google/gemma-4-27b-it --trust-remote-code and checking if the engine starts successfully.

Example

# Modified code in vllm/model_executor/layers/layernorm.py
if weight is not None and x.shape[-1] != hidden_size:
    raise ValueError(f"Expected hidden_size to be {hidden_size}, but found: {x.shape[-1]}")

Notes

The primary fix is necessary to resolve the issue, while the defensive fix provides additional robustness. The fixes assume that the weight attribute is a reliable indicator of whether the norm module has a constraint on the input dimension.

Recommendation

Apply the primary fix, as it directly addresses the root cause of the issue. The defensive fix can be applied additionally to provide extra robustness.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Engine should start successfully and serve Gemma4 multimodal requests.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING