transformers - ✅(Solved) Fix Whisper generation fails on empty transcription after align_special_tokens [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45584Fetched 2026-04-23 07:22:58
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1mentioned ×1subscribed ×1

Error Message

import torch from transformers import WhisperForConditionalGeneration, WhisperProcessor from transformers.trainer_utils import align_special_tokens

def main() -> None: model_name = "openai/whisper-tiny" processor = WhisperProcessor.from_pretrained(model_name) model = WhisperForConditionalGeneration.from_pretrained(model_name)

# clear suppress tokens so the model can freely produce EOS when there is nothing to transcribe.
model.generation_config.begin_suppress_tokens = None
model.generation_config.suppress_tokens = [
    i for i in range(model.config.vocab_size) if i != model.config.eos_token_id
]

# --- Without align_special_tokens: works fine ---
features = processor(torch.zeros(16_000 * 10).numpy(), sampling_rate=16_000, return_tensors="pt").input_features

out = model.generate(features)
decoded = processor.batch_decode(out, skip_special_tokens=True)
print(f"Before align_special_tokens: generate() on silence → {decoded!r}  (empty transcription, as expected)")

# --- After align_special_tokens (called e.g. by Trainer.train()): crashes ---
align_special_tokens(model, processor)

print("\nAfter align_special_tokens: generate() on silence →", end=" ")
try:
    model.generate(features)
    print("no crash")
except IndexError as e:
    print(f"IndexError: {e}")

if name == "main": main()

Fix Action

Fixed

PR fix notes

PR #45570: Fix whisper long-form generation when eos_token_id is a list

Description (problem / solution / changelog)

What does this PR do?

Fixes: #45584

Fixes a bug in Whisper generation code, happening when generation_config.eos_token_id is a list[int] and not an int (happens for instance after align_special_tokens is called in Trainer.train).

Fix is to normalize to a list and use membership checks instead of equality.

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by code agents. We are currently bottlenecked by our ability to review and respond to them. As a result, we ask that new users do not submit pure code agent PRs at this time. You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result, this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

cc @eustlb

Changed files

  • src/transformers/models/whisper/generation_whisper.py (modified, +6/-2)

Code Example

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from transformers.trainer_utils import align_special_tokens


def main() -> None:
    model_name = "openai/whisper-tiny"
    processor = WhisperProcessor.from_pretrained(model_name)
    model = WhisperForConditionalGeneration.from_pretrained(model_name)

    # clear suppress tokens so the model can freely produce EOS when there is nothing to transcribe.
    model.generation_config.begin_suppress_tokens = None
    model.generation_config.suppress_tokens = [
        i for i in range(model.config.vocab_size) if i != model.config.eos_token_id
    ]

    # --- Without align_special_tokens: works fine ---
    features = processor(torch.zeros(16_000 * 10).numpy(), sampling_rate=16_000, return_tensors="pt").input_features

    out = model.generate(features)
    decoded = processor.batch_decode(out, skip_special_tokens=True)
    print(f"Before align_special_tokens: generate() on silence → {decoded!r}  (empty transcription, as expected)")

    # --- After align_special_tokens (called e.g. by Trainer.train()): crashes ---
    align_special_tokens(model, processor)

    print("\nAfter align_special_tokens: generate() on silence →", end=" ")
    try:
        model.generate(features)
        print("no crash")
    except IndexError as e:
        print(f"IndexError: {e}")


if __name__ == "__main__":
    main()
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.6.0.dev0
  • Platform: Linux-6.17.0-1009-gcp-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • Huggingface_hub version: 1.11.0
  • Safetensors version: 0.7.0
  • Accelerate version: 1.13.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.11.0+cu130 (CUDA)
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: no
  • GPU type: NVIDIA A100-SXM4-40GB

Who can help?

@eustlb

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Basic reproduction script

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from transformers.trainer_utils import align_special_tokens


def main() -> None:
    model_name = "openai/whisper-tiny"
    processor = WhisperProcessor.from_pretrained(model_name)
    model = WhisperForConditionalGeneration.from_pretrained(model_name)

    # clear suppress tokens so the model can freely produce EOS when there is nothing to transcribe.
    model.generation_config.begin_suppress_tokens = None
    model.generation_config.suppress_tokens = [
        i for i in range(model.config.vocab_size) if i != model.config.eos_token_id
    ]

    # --- Without align_special_tokens: works fine ---
    features = processor(torch.zeros(16_000 * 10).numpy(), sampling_rate=16_000, return_tensors="pt").input_features

    out = model.generate(features)
    decoded = processor.batch_decode(out, skip_special_tokens=True)
    print(f"Before align_special_tokens: generate() on silence → {decoded!r}  (empty transcription, as expected)")

    # --- After align_special_tokens (called e.g. by Trainer.train()): crashes ---
    align_special_tokens(model, processor)

    print("\nAfter align_special_tokens: generate() on silence →", end=" ")
    try:
        model.generate(features)
        print("no crash")
    except IndexError as e:
        print(f"IndexError: {e}")


if __name__ == "__main__":
    main()

Expected behavior

The generation should not fail (and return an empty transcription)

extent analysis

TL;DR

The issue can be resolved by adjusting the suppress_tokens configuration after calling align_special_tokens to ensure it does not interfere with the model's ability to generate an empty transcription.

Guidance

  • The align_special_tokens function is modifying the model's configuration in a way that causes the generate method to fail when trying to transcribe silence.
  • After calling align_special_tokens, verify that the model.generation_config.suppress_tokens list does not include the eos_token_id, as this can prevent the model from generating an empty transcription.
  • To mitigate the issue, update the suppress_tokens list after calling align_special_tokens to exclude the eos_token_id.
  • Test the generate method again after updating the suppress_tokens list to ensure it no longer crashes when transcribing silence.

Example

align_special_tokens(model, processor)
model.generation_config.suppress_tokens = [i for i in range(model.config.vocab_size) if i not in [model.config.eos_token_id, model.config.pad_token_id]]

Notes

The exact behavior of align_special_tokens is not specified in the issue, so the above guidance is based on the observed behavior in the provided reproduction script.

Recommendation

Apply workaround: The issue can be resolved by updating the suppress_tokens list after calling align_special_tokens, as shown in the example above. This ensures that the model can generate an empty transcription when transcribing silence.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The generation should not fail (and return an empty transcription)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING