transformers - ✅(Solved) Fix Unexpected behaviour of helper function `_get_feat_extract_output_lengths` in qwen3_omni_moe [2 pull requests, 2 comments, 3 participants]

Q: Expected behavior

Current implementation is ```python def _get_feat_extract_output_lengths(input_lengths): """ Computes the output length of the convolutional layers and the output length of the audio encoder """ input_lengths_leave = input_lengths % 100 feat_lengths = (input_lengths_leave - 1) // 2 + 1 output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 + (input_lengths // 100) * 13 return output_lengths ``` and the expected implementation is ```python def _get_feat_extract_output_lengths(input_lengths): """ Computes the output length of the convolutional layers and the output length of the audio encoder """ feat_lengths = (input_lengths- 1) // 2 + 1 output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 return output_lengths ```

transformers2026-03-28 14:16:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45083•Fetched 2026-04-08 01:45:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

mentioned ×3subscribed ×3commented ×2cross-referenced ×2

Fix Action

Fixed

Fixed by PR: fix audio encoder output length formula in qwen3_omni_moe (https://github.com/huggingface/transformers/pull/45088)
Fixed by PR: Fix _get_feat_extract_output_lengths in qwen3_omni_moe (https://github.com/huggingface/transformers/pull/45091)

PR fix notes

PR #45088: fix audio encoder output length formula in qwen3_omni_moe

Repository: huggingface/transformers
Author: knQzx
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45088

Description (problem / solution / changelog)

corrects the conv output length calculation in _get_feat_extract_output_lengths which was computing wrong values for the audio encoder. fixes #45083

Changed files

src/transformers/models/qwen3_omni_moe/modeling_qwen3_omni_moe.py (modified, +2/-3)
src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py (modified, +2/-3)
src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py (modified, +2/-3)

PR #45091: Fix _get_feat_extract_output_lengths in qwen3_omni_moe

Repository: huggingface/transformers
Author: hkc5
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45091

Description (problem / solution / changelog)

This PR fixes the unexpected behaviour of helper function _get_feat_extract_output_lengths in qwen3_omni_moe as reported in #45083.

Problem

The current implementation incorrectly calculates the output length of the convolutional layers by:

Taking modulo 100 of input lengths
Adding a correction factor of (input_lengths // 100) * 13

This does not align with the official PyTorch Conv2d formula.

Fix

Updated the function to correctly calculate the output length based on the PyTorch Conv2d formula:

For Conv2d with kernel_size=3, stride=2, padding=1: output = (input - 1) // 2 + 1
Applied sequentially for the 3 conv layers in the audio encoder

Files Changed

src/transformers/models/qwen3_omni_moe/modeling_qwen3_omni_moe.py
src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py
src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py

Fixes #45083

Changed files

src/transformers/models/qwen3_omni_moe/modeling_qwen3_omni_moe.py (modified, +2/-3)
src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py (modified, +2/-3)
src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py (modified, +2/-3)

Code Example

def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    input_lengths_leave = input_lengths % 100
    feat_lengths = (input_lengths_leave - 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 + (input_lengths // 100) * 13
    return output_lengths

---

def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    feat_lengths = (input_lengths- 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 
    return output_lengths

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.0.0
Platform: Linux-6.6.113+-x86_64-with-glibc2.35
Python version: 3.12.13
Huggingface_hub version: 1.7.1
Safetensors version: 0.7.0
Accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.10.0+cpu (NA)
Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

https://github.com/huggingface/transformers/blob/9a9997fd73c5eb29fb3677d3c489f5d3cd0765f6/src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py#L117 The implementation of above function computing the output length of the audio encoder does not align with the official formula of pytorch Conv2d. The audio encoder convolution is defined in https://github.com/huggingface/transformers/blob/9a9997fd73c5eb29fb3677d3c489f5d3cd0765f6/src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py#L871

Expected behavior

Current implementation is

def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    input_lengths_leave = input_lengths % 100
    feat_lengths = (input_lengths_leave - 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 + (input_lengths // 100) * 13
    return output_lengths

and the expected implementation is

def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    feat_lengths = (input_lengths- 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 
    return output_lengths

extent analysis

Fix Plan

To fix the issue, we need to update the _get_feat_extract_output_lengths function to correctly calculate the output length of the audio encoder.

Here are the steps:

Update the modular_qwen3_omni_moe.py file with the correct implementation of the _get_feat_extract_output_lengths function.
The corrected function should be:

def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    feat_lengths = (input_lengths - 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 
    return output_lengths

Replace the existing function with the corrected one in the modular_qwen3_omni_moe.py file.

Verification

To verify that the fix worked, you can test the updated function with different input lengths and compare the output with the expected results.

You can add test cases like this:

input_lengths = [100, 200, 300]
expected_output_lengths = [13, 26, 39]

for input_length, expected_output_length in zip(input_lengths, expected_output_lengths):
    output_length = _get_feat_extract_output_lengths(input_length)
    assert output_length == expected_output_length, f"Expected output length {expected_output_length} but got {output_length}"

If all test cases pass, it means the fix is correct and the function is working as expected.

Extra Tips

Make sure to update the transformers library to the latest version to ensure that the fix is included in the future releases. Also, it's a good practice to write unit tests for critical functions like _get_feat_extract_output_lengths to catch any regressions in the future.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Current implementation is

def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    input_lengths_leave = input_lengths % 100
    feat_lengths = (input_lengths_leave - 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 + (input_lengths // 100) * 13
    return output_lengths

and the expected implementation is

def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    feat_lengths = (input_lengths- 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 
    return output_lengths

#prompt template #agent execution #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix Unexpected behaviour of helper function `_get_feat_extract_output_lengths` in qwen3_omni_moe [2 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #45088: fix audio encoder output length formula in qwen3_omni_moe

Description (problem / solution / changelog)

Changed files

PR #45091: Fix _get_feat_extract_output_lengths in qwen3_omni_moe

Description (problem / solution / changelog)

Problem

Fix

Files Changed

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING