transformers - ✅(Solved) Fix [BUG] Wav2Vec2 wav2vec2-lv-60-espeak-cv-ft: save_pretrained and tokenization fail [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45198Fetched 2026-04-08 02:33:19
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
cross-referenced ×1labeled ×1

Error Message

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft") try: phonemes = tokenizer.phonemize("Hello how are you", phonemizer_lang="en-us") print(f"phonemes: {phonemes}") except Exception as e: print(e)

Fix Action

Fixed

PR fix notes

PR #45199: fix(models): Resolve regressions in Wav2Vec2PhonemeCTCTokenizer (wav2vec2-lv-60-espeak-cv-ft)

Description (problem / solution / changelog)

What does this PR do?

The following Wav2Vec2PhonemeCTC use cases were identified and fixed in this PR:

05c0e1d ("rm slow tokenizers") added self.backend = kwargs.pop("backend", None). Wav2Vec2PhonemeCTCTokenizer already used self.backend for its phonemizer EspeakBackend object set in init_backend. Regardless of call order, one clobbers the other; either the base class overwrites the phonemizer object with None (breaking phonemize()), or the phonemizer object overwrites the base class's serializable value (breaking save_pretrained with EspeakBackend is not JSON serializable). Renamed to self._phonemizer_backend so both attributes coexist. Followed the same naming convention used for _word_delimiter_token and _phone_delimiter_token in the same file. → Same commit consolidated tokenization_utils.py into tokenization_python.py. In the old code, _encode_plus had return_offsets_mapping as a named param and raised NotImplementedError before reaching tokenize(). After the refactor, return_offsets_mapping is no longer a named param in _encode_plus, so it flows through **kwargstokenize()prepare_for_tokenization(), which had a fixed signature. Added **kwargs to match the base class contract at tokenization_python.py#L836-L838. No other models are affected; Wav2Vec2PhonemeCTCTokenizer is the only override of prepare_for_tokenization that was missing **kwargs :) → For more details on reproducing the bug and the output screenshots, please visit the linked issue!

Fixes #45198

cc: @Rocketknight1 @itazap

CI coverage fixed by this PR (as suggested for inclusion in the PR):

CI run test coverage of this behavior:

models/wav2vec2_phoneme/test_tokenization_wav2vec2_phoneme.py::Wav2Vec2PhonemeCTCTokenizerTest:

test_batch_encode_dynamic_overflowing, test_batch_encode_plus_batch_sequence_length, test_batch_encode_plus_padding, test_call, test_case_insensitive, test_change_phonemizer_lang, test_chat_template, test_chat_template_batched, test_decode_with_del, test_empty_input_string, test_encode, test_encode_basic_padding, test_encode_decode, test_encode_decode_with_del, test_encode_decode_with_del_filter, test_encode_plus_with_padding_0, test_encode_plus_with_padding_1, test_encode_with_del, test_mask_output, test_maximum_encoding_length_pair_input, test_maximum_encoding_length_single_input, test_number_of_added_tokens, test_offsets, test_offsets_batch, test_padding_to_multiple_of, test_phonemize, test_phonemize_with_word_del, test_prepare_seq2seq_batch, test_pretokenized_inputs, test_right_and_left_truncation, test_save_and_load_tokenizer, test_special_tokens_mask, test_special_tokens_mask_input_pairs, test_token_type_ids, test_tokenizer_add_new_tokens

The remaining post-fix failures tmk are not caused by Wav2Vec2PhonemeCTCTokenizer but originate from the tokenization_python.py base class changes (phone delimiter token no longer inserted between phoneme tokens during encoding, added/special token serialization changes, and TypeError in offset computation). These would need fixes in the base class, not in the model tokenizer :)

Repro output after the fixes (feel free to cross-check):

<img width="600" height="500" alt="2" src="https://github.com/user-attachments/assets/c210b1e4-28f9-40c5-819c-a53fccc480ae" />

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you fix any necessary existing tests?

Changed files

  • src/transformers/models/wav2vec2_phoneme/tokenization_wav2vec2_phoneme.py (modified, +3/-2)

Code Example

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
try:
    phonemes = tokenizer.phonemize("Hello how are you", phonemizer_lang="en-us")
    print(f"phonemes: {phonemes}")
except Exception as e:
    print(e)

---

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
try:
    output = tokenizer("Hello how are you", return_token_type_ids=True)
    print(f"input_ids length: {len(output['input_ids'])}")
except Exception as e:
    print(e)
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.5.0.dev0
  • Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • huggingface_hub version: 1.8.0
  • safetensors version: 0.7.0
  • accelerate version: 1.13.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.11.0+cu130 (CUDA)
  • GPU type: NVIDIA GeForce RTX 4060 Laptop GPU

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

phonemize() crashes:

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
try:
    phonemes = tokenizer.phonemize("Hello how are you", phonemizer_lang="en-us")
    print(f"phonemes: {phonemes}")
except Exception as e:
    print(e)

Tokenization crashes:

from transformers import Wav2Vec2PhonemeCTCTokenizer

tokenizer = Wav2Vec2PhonemeCTCTokenizer.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
try:
    output = tokenizer("Hello how are you", return_token_type_ids=True)
    print(f"input_ids length: {len(output['input_ids'])}")
except Exception as e:
    print(e)

→ Loading "facebook/wav2vec2-lv-60-espeak-cv-ft" and tokenizing with return_token_type_ids=True crashes; got: Wav2Vec2PhonemeCTCTokenizer.prepare_for_tokenization() got an unexpected keyword argument 'return_offsets_mapping'. → Loading "facebook/wav2vec2-lv-60-espeak-cv-ft" and calling phonemize() crashes; got: 'str' object has no attribute 'phonemize'.

Current Repro Output:

<img width="700" height="500" alt="Image" src="https://github.com/user-attachments/assets/cc6ba562-aa42-49df-98f1-10dc1c3c25f6" />

Expected behavior

phonemize() should return the appropriate phoneme string. → Tokenization should complete successfully and print the input_ids length.

extent analysis

TL;DR

The issue is likely due to incompatible versions of the transformers library and its dependencies, causing crashes in phonemization and tokenization.

Guidance

  • Verify that the transformers version 5.5.0.dev0 is compatible with the other library versions, especially huggingface_hub version 1.8.0 and PyTorch version 2.11.0+cu130.
  • Check the documentation for Wav2Vec2PhonemeCTCTokenizer to ensure that the phonemize method and return_token_type_ids argument are supported in the current version.
  • Try updating the transformers library to a stable version, such as the latest release, to see if the issue persists.
  • Review the error messages and stack traces to identify any specific dependencies or functions that may be causing the crashes.

Example

No code snippet is provided as the issue is likely related to version compatibility rather than code syntax.

Notes

The issue may be specific to the development version of the transformers library, and updating to a stable version may resolve the issue. Additionally, the error messages suggest that there may be incompatible changes in the Wav2Vec2PhonemeCTCTokenizer class or its dependencies.

Recommendation

Apply workaround: Try updating the transformers library to a stable version, such as the latest release, to see if the issue persists. This may resolve the compatibility issues and allow phonemization and tokenization to complete successfully.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

phonemize() should return the appropriate phoneme string. → Tokenization should complete successfully and print the input_ids length.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix [BUG] Wav2Vec2 wav2vec2-lv-60-espeak-cv-ft: save_pretrained and tokenization fail [2 pull requests, 1 participants]