transformers - ✅(Solved) Fix [BUG] Wav2Vec2 wav2vec2-lv-60-espeak-cv-ft: save_pretrained and tokenization fail [2 pull requests, 1 participants]

Q: Expected behavior

→ `phonemize()` should return the appropriate phoneme string. → Tokenization should complete successfully and print the `input_ids` length.

transformers2026-04-02 19:58:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45198•Fetched 2026-04-08 02:33:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

harshaljanjani

Participants

harshaljanjani

Timeline (top)

cross-referenced ×1labeled ×1

Error Message

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft") try: phonemes = tokenizer.phonemize("Hello how are you", phonemizer_lang="en-us") print(f"phonemes: {phonemes}") except Exception as e: print(e)

Fix Action

Fixed

Fixed by PR: fix(models): Resolve regressions in Wav2Vec2PhonemeCTCTokenizer (wav2vec2-lv-60-espeak-cv-ft) (https://github.com/huggingface/transformers/pull/45199)

PR fix notes

PR #45199: fix(models): Resolve regressions in Wav2Vec2PhonemeCTCTokenizer (wav2vec2-lv-60-espeak-cv-ft)

Repository: huggingface/transformers
Author: harshaljanjani
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45199

Description (problem / solution / changelog)

What does this PR do?

The following Wav2Vec2PhonemeCTC use cases were identified and fixed in this PR:

→ 05c0e1d ("rm slow tokenizers") added self.backend = kwargs.pop("backend", None). Wav2Vec2PhonemeCTCTokenizer already used self.backend for its phonemizer EspeakBackend object set in init_backend. Regardless of call order, one clobbers the other; either the base class overwrites the phonemizer object with None (breaking phonemize()), or the phonemizer object overwrites the base class's serializable value (breaking save_pretrained with EspeakBackend is not JSON serializable). Renamed to self._phonemizer_backend so both attributes coexist. Followed the same naming convention used for _word_delimiter_token and _phone_delimiter_token in the same file. → Same commit consolidated tokenization_utils.py into tokenization_python.py. In the old code, _encode_plus had return_offsets_mapping as a named param and raised NotImplementedError before reaching tokenize(). After the refactor, return_offsets_mapping is no longer a named param in _encode_plus, so it flows through **kwargs → tokenize() → prepare_for_tokenization(), which had a fixed signature. Added **kwargs to match the base class contract at tokenization_python.py#L836-L838. No other models are affected; Wav2Vec2PhonemeCTCTokenizer is the only override of prepare_for_tokenization that was missing **kwargs :) → For more details on reproducing the bug and the output screenshots, please visit the linked issue!

Fixes #45198

cc: @Rocketknight1 @itazap

CI coverage fixed by this PR (as suggested for inclusion in the PR):

CI run test coverage of this behavior:

→ models/wav2vec2_phoneme/test_tokenization_wav2vec2_phoneme.py::Wav2Vec2PhonemeCTCTokenizerTest:

test_batch_encode_dynamic_overflowing, test_batch_encode_plus_batch_sequence_length, test_batch_encode_plus_padding, test_call, test_case_insensitive, test_change_phonemizer_lang, test_chat_template, test_chat_template_batched, test_decode_with_del, test_empty_input_string, test_encode, test_encode_basic_padding, test_encode_decode, test_encode_decode_with_del, test_encode_decode_with_del_filter, test_encode_plus_with_padding_0, test_encode_plus_with_padding_1, test_encode_with_del, test_mask_output, test_maximum_encoding_length_pair_input, test_maximum_encoding_length_single_input, test_number_of_added_tokens, test_offsets, test_offsets_batch, test_padding_to_multiple_of, test_phonemize, test_phonemize_with_word_del, test_prepare_seq2seq_batch, test_pretokenized_inputs, test_right_and_left_truncation, test_save_and_load_tokenizer, test_special_tokens_mask, test_special_tokens_mask_input_pairs, test_token_type_ids, test_tokenizer_add_new_tokens

The remaining post-fix failures tmk are not caused by Wav2Vec2PhonemeCTCTokenizer but originate from the tokenization_python.py base class changes (phone delimiter token no longer inserted between phoneme tokens during encoding, added/special token serialization changes, and TypeError in offset computation). These would need fixes in the base class, not in the model tokenizer :)

Repro output after the fixes (feel free to cross-check):

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you fix any necessary existing tests?

Changed files

src/transformers/models/wav2vec2_phoneme/tokenization_wav2vec2_phoneme.py (modified, +3/-2)

Code Example

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
try:
    phonemes = tokenizer.phonemize("Hello how are you", phonemizer_lang="en-us")
    print(f"phonemes: {phonemes}")
except Exception as e:
    print(e)

---

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
try:
    output = tokenizer("Hello how are you", return_token_type_ids=True)
    print(f"input_ids length: {len(output['input_ids'])}")
except Exception as e:
    print(e)

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.5.0.dev0
Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python version: 3.12.3
huggingface_hub version: 1.8.0
safetensors version: 0.7.0
accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.11.0+cu130 (CUDA)
GPU type: NVIDIA GeForce RTX 4060 Laptop GPU

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

→ phonemize() crashes:

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
try:
    phonemes = tokenizer.phonemize("Hello how are you", phonemizer_lang="en-us")
    print(f"phonemes: {phonemes}")
except Exception as e:
    print(e)

→ Tokenization crashes:

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
try:
    output = tokenizer("Hello how are you", return_token_type_ids=True)
    print(f"input_ids length: {len(output['input_ids'])}")
except Exception as e:
    print(e)

→ Loading "facebook/wav2vec2-lv-60-espeak-cv-ft" and tokenizing with return_token_type_ids=True crashes; got: Wav2Vec2PhonemeCTCTokenizer.prepare_for_tokenization() got an unexpected keyword argument 'return_offsets_mapping'. → Loading "facebook/wav2vec2-lv-60-espeak-cv-ft" and calling phonemize() crashes; got: 'str' object has no attribute 'phonemize'.

Current Repro Output:

Expected behavior

→ phonemize() should return the appropriate phoneme string. → Tokenization should complete successfully and print the input_ids length.

extent analysis

TL;DR

The issue is likely due to incompatible versions of the transformers library and its dependencies, causing crashes in phonemization and tokenization.

Guidance

Verify that the transformers version 5.5.0.dev0 is compatible with the other library versions, especially huggingface_hub version 1.8.0 and PyTorch version 2.11.0+cu130.
Check the documentation for Wav2Vec2PhonemeCTCTokenizer to ensure that the phonemize method and return_token_type_ids argument are supported in the current version.
Try updating the transformers library to a stable version, such as the latest release, to see if the issue persists.
Review the error messages and stack traces to identify any specific dependencies or functions that may be causing the crashes.

Example

No code snippet is provided as the issue is likely related to version compatibility rather than code syntax.

Notes

The issue may be specific to the development version of the transformers library, and updating to a stable version may resolve the issue. Additionally, the error messages suggest that there may be incompatible changes in the Wav2Vec2PhonemeCTCTokenizer class or its dependencies.

Recommendation

Apply workaround: Try updating the transformers library to a stable version, such as the latest release, to see if the issue persists. This may resolve the compatibility issues and allow phonemization and tokenization to complete successfully.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

→ phonemize() should return the appropriate phoneme string. → Tokenization should complete successfully and print the input_ids length.

#permission error #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix [BUG] Wav2Vec2 wav2vec2-lv-60-espeak-cv-ft: save_pretrained and tokenization fail [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #45199: fix(models): Resolve regressions in Wav2Vec2PhonemeCTCTokenizer (wav2vec2-lv-60-espeak-cv-ft)

Description (problem / solution / changelog)

What does this PR do?

Code Agent Policy

Before submitting

Changed files

Code Example

System Info

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix [BUG] Wav2Vec2 wav2vec2-lv-60-espeak-cv-ft: save_pretrained and tokenization fail [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #45199: fix(models): Resolve regressions in Wav2Vec2PhonemeCTCTokenizer (wav2vec2-lv-60-espeak-cv-ft)

Description (problem / solution / changelog)

What does this PR do?

Code Agent Policy

Before submitting

Changed files

Code Example

System Info

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING