transformers - 💡(How to fix) Fix Whisper processor.batch_decode() function ignoring skip_special_tokens params [5 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44811Fetched 2026-04-08 00:57:14
View on GitHub
Comments
5
Participants
3
Timeline
13
Reactions
0
Author
Timeline (top)
commented ×5mentioned ×3subscribed ×3closed ×1

Code Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset

# load model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.config.forced_decoder_ids = None

# load dummy dataset and read audio files
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features 

# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
print("[Skip special tokens=False] ", transcription)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print("[Skip special tokens=True] ", transcription)

---

# generate token ids
predicted_ids = model.generate(input_features, return_dict_in_generate=True)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids.sequences, skip_special_tokens=False)
print("[Skip special tokens=False] ", transcription)
transcription = processor.batch_decode(predicted_ids.sequences, skip_special_tokens=True)
print("[Skip special tokens=True] ", transcription)
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 4.57.6
  • Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.10.13
  • Huggingface_hub version: 0.36.2
  • Safetensors version: 0.7.0
  • Accelerate version: 1.13.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.10.0+cu130 (CUDA)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA GeForce RTX 3080 Ti Laptop GPU

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Running the code provided here, it seems that skip_special_tokens param is being ignored.

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset

# load model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.config.forced_decoder_ids = None

# load dummy dataset and read audio files
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features 

# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
print("[Skip special tokens=False] ", transcription)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print("[Skip special tokens=True] ", transcription)

The output is the following: [Skip special tokens=False] [' Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.'] [Skip special tokens=True] [' Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.']

If however a dict output is required by running the following code:

# generate token ids
predicted_ids = model.generate(input_features, return_dict_in_generate=True)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids.sequences, skip_special_tokens=False)
print("[Skip special tokens=False] ", transcription)
transcription = processor.batch_decode(predicted_ids.sequences, skip_special_tokens=True)
print("[Skip special tokens=True] ", transcription)

The output is (correctly) the following: [Skip special tokens=False] ['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.<|endoftext|>'] [Skip special tokens=True] [' Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.']

Expected behavior

The processor.batch_decode() function should output also special tokens if skip_special_tokens=False is passed.

extent analysis

Fix Plan

To fix the issue, you need to modify the generate method call to return a dictionary output. This will allow the batch_decode method to correctly handle the skip_special_tokens parameter.

Here are the steps:

  • Modify the generate method call to include return_dict_in_generate=True.
  • Access the generated sequences from the dictionary output using predicted_ids.sequences.

Example code:

# generate token ids
predicted_ids = model.generate(input_features, return_dict_in_generate=True)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids.sequences, skip_special_tokens=False)
print("[Skip special tokens=False] ", transcription)
transcription = processor.batch_decode(predicted_ids.sequences, skip_special_tokens=True)
print("[Skip special tokens=True] ", transcription)

Verification

To verify that the fix worked, run the modified code and check the output. The batch_decode method should now correctly output special tokens when skip_special_tokens=False is passed.

Extra Tips

  • Make sure to update the generate method call to include return_dict_in_generate=True whenever you need to access the generated sequences as a dictionary output.
  • If you need to use the generate method without return_dict_in_generate=True, you can access the generated sequences directly from the output tensor. However, this may not work correctly with the batch_decode method and the skip_special_tokens parameter.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The processor.batch_decode() function should output also special tokens if skip_special_tokens=False is passed.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - 💡(How to fix) Fix Whisper processor.batch_decode() function ignoring skip_special_tokens params [5 comments, 3 participants]