vllm - ✅(Solved) Fix [Bug]: Gemma3n concurrent audio requests crash EngineCore — missing dynamic_dims on audio sequence dimension [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38297Fetched 2026-04-08 01:36:44
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
referenced ×2cross-referenced ×1

Error Message

File "vllm/utils/tensor_schema.py", in _validate_field
    ValueError: input_features_padded contains inconsistent shapes:
    torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

Root Cause

This is the same bug that #31219 reported for Qwen2Audio/AudioFlamingo3/MiniCPM-V, fixed in #31223 — but Gemma3n wasn't included in that fix.

Gemma3nAudioInputs in gemma3n_mm.py declares its shapes without dynamic_dims:

class Gemma3nAudioInputs(TensorSchema):
    input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f")]
    input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s")]

The "s" dimension varies with audio length, so TensorSchema.validate() rejects batches where the two items have different sequence lengths.

The padding logic itself works fine (#24052 added it back in Sept 2025). The validator just doesn't know that "s" is allowed to vary across batch items.

Fix Action

Fixed

PR fix notes

PR #38305: [Bugfix] Fix Gemma3n concurrent audio requests crashing EngineCore

Description (problem / solution / changelog)

Summary

Gemma3nAudioInputs declares input_features_padded and input_features_mask with fixed sequence-length dimensions. When two concurrent requests carry audio clips of different durations, TensorSchema.validate() sees mismatched shapes and kills EngineCore:

ValueError: input_features_padded contains inconsistent shapes:
  torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

The fix adds dynamic_dims={"s"} to both fields, matching the pattern established by PR #31223 for MiniCPM-o, Qwen2Audio, and AudioFlamingo3.

Changes

  • vllm/model_executor/models/gemma3n_mm.py: Add dynamic_dims={"s"} to input_features_padded and input_features_mask in Gemma3nAudioInputs

Test Plan

  • Verify concurrent audio requests to Gemma3n no longer crash
  • Existing multimodal tests pass

Fixes #38297

Changed files

  • vllm/model_executor/models/gemma3n_mm.py (modified, +6/-2)

Code Example

ValueError: input_features_padded contains inconsistent shapes: torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

---

class Gemma3nAudioInputs(TensorSchema):
    input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f")]
    input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s")]

---

File "vllm/utils/tensor_schema.py", in _validate_field
    ValueError: input_features_padded contains inconsistent shapes:
    torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

---

input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f", dynamic_dims={"s"})]
input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s", dynamic_dims={"s"})]
RAW_BUFFERClick to expand / collapse

Your current environment

  • vLLM 0.17.1 (also checked current main @ ba2f0acc, same code)
  • transformers 4.57.6
  • model: google/gemma-3n-E4B-it
  • GPU: NVIDIA A10G (24GB)
  • API: /v1/chat/completions with input_audio items

Describe the bug

When two concurrent /v1/chat/completions requests with audio of different durations get batched together, EngineCore crashes:

ValueError: input_features_padded contains inconsistent shapes: torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

The first dimension (496 vs 449) is the audio sequence length, which naturally varies with duration. Each request has a single input_audio item — these are separate concurrent requests, not one request with multiple audios.

This kills EngineCore and every in-flight request gets a 500.

Root cause

This is the same bug that #31219 reported for Qwen2Audio/AudioFlamingo3/MiniCPM-V, fixed in #31223 — but Gemma3n wasn't included in that fix.

Gemma3nAudioInputs in gemma3n_mm.py declares its shapes without dynamic_dims:

class Gemma3nAudioInputs(TensorSchema):
    input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f")]
    input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s")]

The "s" dimension varies with audio length, so TensorSchema.validate() rejects batches where the two items have different sequence lengths.

The padding logic itself works fine (#24052 added it back in Sept 2025). The validator just doesn't know that "s" is allowed to vary across batch items.

Stack trace

File "vllm/utils/tensor_schema.py", in _validate_field
    ValueError: input_features_padded contains inconsistent shapes:
    torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

Suggested fix

Same pattern as #31223 — add dynamic_dims={"s"}:

input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f", dynamic_dims={"s"})]
input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s", dynamic_dims={"s"})]

I can put up a PR for this if it makes sense.

extent analysis

Fix Plan

To resolve the issue, we need to update the Gemma3nAudioInputs class to allow for dynamic sequence lengths.

  • Update gemma3n_mm.py with the following code:
class Gemma3nAudioInputs(TensorSchema):
    input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f", dynamic_dims={"s"})]
    input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s", dynamic_dims={"s"})]
  • Verify that the dynamic_dims parameter is correctly applied to the TensorShape annotations.

Verification

To verify the fix, you can:

  • Send concurrent /v1/chat/completions requests with audio of different durations.
  • Check that the EngineCore no longer crashes and returns the expected responses.
  • Monitor the logs for any errors related to inconsistent shapes.

Extra Tips

  • Make sure to test the fix with various audio lengths to ensure that the dynamic_dims parameter is working as expected.
  • Consider adding additional tests to cover this scenario and prevent similar issues in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING