transformers - ✅(Solved) Fix Whisper generation fails on empty transcription after align_special_tokens [1 pull requests, 1 participants]

transformers2026-04-22 17:23:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45584•Fetched 2026-04-23 07:22:58

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ronansgd

Participants

ronansgd

Timeline (top)

cross-referenced ×1labeled ×1mentioned ×1subscribed ×1

Error Message

import torch from transformers import WhisperForConditionalGeneration, WhisperProcessor from transformers.trainer_utils import align_special_tokens

def main() -> None: model_name = "openai/whisper-tiny" processor = WhisperProcessor.from_pretrained(model_name) model = WhisperForConditionalGeneration.from_pretrained(model_name)

# clear suppress tokens so the model can freely produce EOS when there is nothing to transcribe.
model.generation_config.begin_suppress_tokens = None
model.generation_config.suppress_tokens = [
    i for i in range(model.config.vocab_size) if i != model.config.eos_token_id
]

# --- Without align_special_tokens: works fine ---
features = processor(torch.zeros(16_000 * 10).numpy(), sampling_rate=16_000, return_tensors="pt").input_features

out = model.generate(features)
decoded = processor.batch_decode(out, skip_special_tokens=True)
print(f"Before align_special_tokens: generate() on silence → {decoded!r}  (empty transcription, as expected)")

# --- After align_special_tokens (called e.g. by Trainer.train()): crashes ---
align_special_tokens(model, processor)

print("\nAfter align_special_tokens: generate() on silence →", end=" ")
try:
    model.generate(features)
    print("no crash")
except IndexError as e:
    print(f"IndexError: {e}")

if name == "main": main()

Fix Action

Fixed

Fixed by PR: Fix whisper long-form generation when eos_token_id is a list (https://github.com/huggingface/transformers/pull/45570)

PR fix notes

PR #45570: Fix whisper long-form generation when eos_token_id is a list

Repository: huggingface/transformers
Author: ronansgd
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45570

Description (problem / solution / changelog)

What does this PR do?

Fixes: #45584

Fixes a bug in Whisper generation code, happening when generation_config.eos_token_id is a list[int] and not an int (happens for instance after align_special_tokens is called in Trainer.train).

Fix is to normalize to a list and use membership checks instead of equality.

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by code agents. We are currently bottlenecked by our ability to review and respond to them. As a result, we ask that new users do not submit pure code agent PRs at this time. You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result, this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

cc @eustlb

Changed files

src/transformers/models/whisper/generation_whisper.py (modified, +6/-2)

Code Example

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from transformers.trainer_utils import align_special_tokens


def main() -> None:
    model_name = "openai/whisper-tiny"
    processor = WhisperProcessor.from_pretrained(model_name)
    model = WhisperForConditionalGeneration.from_pretrained(model_name)

    # clear suppress tokens so the model can freely produce EOS when there is nothing to transcribe.
    model.generation_config.begin_suppress_tokens = None
    model.generation_config.suppress_tokens = [
        i for i in range(model.config.vocab_size) if i != model.config.eos_token_id
    ]

    # --- Without align_special_tokens: works fine ---
    features = processor(torch.zeros(16_000 * 10).numpy(), sampling_rate=16_000, return_tensors="pt").input_features

    out = model.generate(features)
    decoded = processor.batch_decode(out, skip_special_tokens=True)
    print(f"Before align_special_tokens: generate() on silence → {decoded!r}  (empty transcription, as expected)")

    # --- After align_special_tokens (called e.g. by Trainer.train()): crashes ---
    align_special_tokens(model, processor)

    print("\nAfter align_special_tokens: generate() on silence →", end=" ")
    try:
        model.generate(features)
        print("no crash")
    except IndexError as e:
        print(f"IndexError: {e}")


if __name__ == "__main__":
    main()

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.6.0.dev0
Platform: Linux-6.17.0-1009-gcp-x86_64-with-glibc2.39
Python version: 3.12.3
Huggingface_hub version: 1.11.0
Safetensors version: 0.7.0
Accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.11.0+cu130 (CUDA)
Using distributed or parallel set-up in script?: no
Using GPU in script?: no
GPU type: NVIDIA A100-SXM4-40GB

Who can help?

@eustlb

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Basic reproduction script

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from transformers.trainer_utils import align_special_tokens


def main() -> None:
    model_name = "openai/whisper-tiny"
    processor = WhisperProcessor.from_pretrained(model_name)
    model = WhisperForConditionalGeneration.from_pretrained(model_name)

    # clear suppress tokens so the model can freely produce EOS when there is nothing to transcribe.
    model.generation_config.begin_suppress_tokens = None
    model.generation_config.suppress_tokens = [
        i for i in range(model.config.vocab_size) if i != model.config.eos_token_id
    ]

    # --- Without align_special_tokens: works fine ---
    features = processor(torch.zeros(16_000 * 10).numpy(), sampling_rate=16_000, return_tensors="pt").input_features

    out = model.generate(features)
    decoded = processor.batch_decode(out, skip_special_tokens=True)
    print(f"Before align_special_tokens: generate() on silence → {decoded!r}  (empty transcription, as expected)")

    # --- After align_special_tokens (called e.g. by Trainer.train()): crashes ---
    align_special_tokens(model, processor)

    print("\nAfter align_special_tokens: generate() on silence →", end=" ")
    try:
        model.generate(features)
        print("no crash")
    except IndexError as e:
        print(f"IndexError: {e}")


if __name__ == "__main__":
    main()

Expected behavior

The generation should not fail (and return an empty transcription)

extent analysis

TL;DR

The issue can be resolved by adjusting the suppress_tokens configuration after calling align_special_tokens to ensure it does not interfere with the model's ability to generate an empty transcription.

Guidance

The align_special_tokens function is modifying the model's configuration in a way that causes the generate method to fail when trying to transcribe silence.
After calling align_special_tokens, verify that the model.generation_config.suppress_tokens list does not include the eos_token_id, as this can prevent the model from generating an empty transcription.
To mitigate the issue, update the suppress_tokens list after calling align_special_tokens to exclude the eos_token_id.
Test the generate method again after updating the suppress_tokens list to ensure it no longer crashes when transcribing silence.

Example

align_special_tokens(model, processor)
model.generation_config.suppress_tokens = [i for i in range(model.config.vocab_size) if i not in [model.config.eos_token_id, model.config.pad_token_id]]

Notes

The exact behavior of align_special_tokens is not specified in the issue, so the above guidance is based on the observed behavior in the provided reproduction script.

Recommendation

Apply workaround: The issue can be resolved by updating the suppress_tokens list after calling align_special_tokens, as shown in the example above. This ensures that the model can generate an empty transcription when transcribing silence.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The generation should not fail (and return an empty transcription)

#pipeline error #runtime error #dependency conflict #environment setup #docker error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix Whisper generation fails on empty transcription after align_special_tokens [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #45570: Fix whisper long-form generation when eos_token_id is a list

Description (problem / solution / changelog)

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING