vllm - ✅(Solved) Fix [Bug]: Gemma3n concurrent audio requests crash EngineCore — missing dynamic_dims on audio sequence dimension [1 pull requests, 1 participants]

vllm2026-03-27 00:28:01

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38297•Fetched 2026-04-08 01:36:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

RushRed

Participants

RushRed

Timeline (top)

referenced ×2cross-referenced ×1

Error Message

File "vllm/utils/tensor_schema.py", in _validate_field
    ValueError: input_features_padded contains inconsistent shapes:
    torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

Root Cause

This is the same bug that #31219 reported for Qwen2Audio/AudioFlamingo3/MiniCPM-V, fixed in #31223 — but Gemma3n wasn't included in that fix.

Gemma3nAudioInputs in gemma3n_mm.py declares its shapes without dynamic_dims:

class Gemma3nAudioInputs(TensorSchema):
    input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f")]
    input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s")]

The "s" dimension varies with audio length, so TensorSchema.validate() rejects batches where the two items have different sequence lengths.

The padding logic itself works fine (#24052 added it back in Sept 2025). The validator just doesn't know that "s" is allowed to vary across batch items.

Fix Action

Fixed

Fixed by PR: [Bugfix] Fix Gemma3n concurrent audio requests crashing EngineCore (https://github.com/vllm-project/vllm/pull/38305)

PR fix notes

PR #38305: [Bugfix] Fix Gemma3n concurrent audio requests crashing EngineCore

Repository: vllm-project/vllm
Author: he-yufeng
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/38305

Description (problem / solution / changelog)

Summary

Gemma3nAudioInputs declares input_features_padded and input_features_mask with fixed sequence-length dimensions. When two concurrent requests carry audio clips of different durations, TensorSchema.validate() sees mismatched shapes and kills EngineCore:

ValueError: input_features_padded contains inconsistent shapes:
  torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

The fix adds dynamic_dims={"s"} to both fields, matching the pattern established by PR #31223 for MiniCPM-o, Qwen2Audio, and AudioFlamingo3.

Changes

vllm/model_executor/models/gemma3n_mm.py: Add dynamic_dims={"s"} to input_features_padded and input_features_mask in Gemma3nAudioInputs

Test Plan

Verify concurrent audio requests to Gemma3n no longer crash
Existing multimodal tests pass

Fixes #38297

Changed files

vllm/model_executor/models/gemma3n_mm.py (modified, +6/-2)

Code Example

ValueError: input_features_padded contains inconsistent shapes: torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

---

class Gemma3nAudioInputs(TensorSchema):
    input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f")]
    input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s")]

---

File "vllm/utils/tensor_schema.py", in _validate_field
    ValueError: input_features_padded contains inconsistent shapes:
    torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

---

input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f", dynamic_dims={"s"})]
input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s", dynamic_dims={"s"})]

RAW_BUFFERClick to expand / collapse

Your current environment

vLLM 0.17.1 (also checked current main @ ba2f0acc, same code)
transformers 4.57.6
model: google/gemma-3n-E4B-it
GPU: NVIDIA A10G (24GB)
API: /v1/chat/completions with input_audio items

Describe the bug

When two concurrent /v1/chat/completions requests with audio of different durations get batched together, EngineCore crashes:

ValueError: input_features_padded contains inconsistent shapes: torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

The first dimension (496 vs 449) is the audio sequence length, which naturally varies with duration. Each request has a single input_audio item — these are separate concurrent requests, not one request with multiple audios.

This kills EngineCore and every in-flight request gets a 500.

Root cause

This is the same bug that #31219 reported for Qwen2Audio/AudioFlamingo3/MiniCPM-V, fixed in #31223 — but Gemma3n wasn't included in that fix.

Gemma3nAudioInputs in gemma3n_mm.py declares its shapes without dynamic_dims:

class Gemma3nAudioInputs(TensorSchema):
    input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f")]
    input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s")]

The "s" dimension varies with audio length, so TensorSchema.validate() rejects batches where the two items have different sequence lengths.

The padding logic itself works fine (#24052 added it back in Sept 2025). The validator just doesn't know that "s" is allowed to vary across batch items.

Stack trace

File "vllm/utils/tensor_schema.py", in _validate_field
    ValueError: input_features_padded contains inconsistent shapes:
    torch.Size([496, 128]) (index 0) vs torch.Size([449, 128]) (index 1)

Suggested fix

Same pattern as #31223 — add dynamic_dims={"s"}:

input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f", dynamic_dims={"s"})]
input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s", dynamic_dims={"s"})]

I can put up a PR for this if it makes sense.

extent analysis

Fix Plan

To resolve the issue, we need to update the Gemma3nAudioInputs class to allow for dynamic sequence lengths.

Update gemma3n_mm.py with the following code:

class Gemma3nAudioInputs(TensorSchema):
    input_features_padded: Annotated[torch.Tensor, TensorShape("bn", "s", "f", dynamic_dims={"s"})]
    input_features_mask: Annotated[torch.Tensor, TensorShape("bn", "s", dynamic_dims={"s"})]

Verify that the dynamic_dims parameter is correctly applied to the TensorShape annotations.

Verification

To verify the fix, you can:

Send concurrent /v1/chat/completions requests with audio of different durations.
Check that the EngineCore no longer crashes and returns the expected responses.
Monitor the logs for any errors related to inconsistent shapes.

Extra Tips

Make sure to test the fix with various audio lengths to ensure that the dynamic_dims parameter is working as expected.
Consider adding additional tests to cover this scenario and prevent similar issues in the future.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #API routing #API middleware #SSR setup #ISR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: Gemma3n concurrent audio requests crash EngineCore — missing dynamic_dims on audio sequence dimension [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #38305: [Bugfix] Fix Gemma3n concurrent audio requests crashing EngineCore

Description (problem / solution / changelog)

Summary

Changes

Test Plan

Changed files

Code Example

Your current environment

Describe the bug

Root cause

Stack trace

Suggested fix

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: Gemma3n concurrent audio requests crash EngineCore — missing dynamic_dims on audio sequence dimension [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #38305: [Bugfix] Fix Gemma3n concurrent audio requests crashing EngineCore

Description (problem / solution / changelog)

Summary

Changes

Test Plan

Changed files

Code Example

Your current environment

Describe the bug

Root cause

Stack trace

Suggested fix

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING