transformers - ✅(Solved) Fix `except import_protobuf_decode_error()` hides real tokenizer errors when protobuf isn't installed [3 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45459Fetched 2026-04-17 08:22:45
View on GitHub
Comments
4
Participants
3
Timeline
28
Reactions
1
Author
Timeline (top)
mentioned ×8subscribed ×8commented ×4referenced ×4

Error Message

from transformers.tokenization_utils_base import import_protobuf_decode_error

try: raise ValueError("real error") except import_protobuf_decode_error(): pass except ValueError as e: print("caught real:", e)

ImportError: ... requires the protobuf library ...

Root Cause

protobuf isn't in install_requires (it only comes in through the sentencepiece extra), so a plain pip install transformers triggers this for any exception the tokenizer constructor raises. The RuntimeError and OSError handlers below never get a chance either, because the except evaluator itself raising terminates handler dispatch.

Fix Action

Fix / Workaround

protobuf isn't in install_requires (it only comes in through the sentencepiece extra), so a plain pip install transformers triggers this for any exception the tokenizer constructor raises. The RuntimeError and OSError handlers below never get a chance either, because the except evaluator itself raising terminates handler dispatch.

PR fix notes

PR #45460: fix(tokenization): re-raise ImportError to allow RuntimeError/OSError fallback (#45459)

Description (problem / solution / changelog)

Summary

When protobuf is not installed, is a function call used as an expression. Because it is evaluated lazily when the try block raises, the resulting from the function itself bypasses the RuntimeError and OSError handlers below it, silently swallowing the original exception.

Changes

  • Replaced with
  • This allows the original exception (RuntimeError or OSError) to propagate to the appropriate fallback handler below
  • When protobuf is not installed, the user now gets the correct error message instead of an opaque protobuf ImportError

Testing

Fixes #45459

Changed files

  • src/transformers/tokenization_utils_base.py (modified, +5/-5)

PR #45466: fix: return empty tuple when protobuf not available

Description (problem / solution / changelog)

Fixes #45459 - Previously import_protobuf_decode_error() raised ImportError when protobuf wasn't installed even for other exceptions, hiding the real error. Now returns empty tuple () so the actual exception propagates.

Changed files

  • src/transformers/tokenization_utils_base.py (modified, +3/-2)

PR #45486: fix: return empty tuple from import_protobuf_decode_error when protobuf is unavailable

Description (problem / solution / changelog)

Fixes #45459

What does this PR do?

import_protobuf_decode_error() raised ImportError when protobuf was not installed. Python evaluates the except-class expression lazily, so that ImportError replaced whatever the tokenizer constructor actually raised. The RuntimeError and OSError handlers below never ran either.

Changed the helper to return () when protobuf is unavailable so the except clause matches nothing and the real exception propagates. DecodeError catching when protobuf is installed is unchanged.

Not a duplicate of #45460 (closed; drops DecodeError catching) or #45466 (opened without issue-author approval).

Tests run:

$ python -m pytest tests/tokenization/test_tokenization_utils.py -v
19 passed, 1 skipped in 36.34s

Two new tests:

  • test_import_protobuf_decode_error_without_protobuf
  • test_import_protobuf_decode_error_does_not_mask_exceptions

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

AI assistance (Claude) was used during investigation and patch preparation. All changes were reviewed and tested by me.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. (#45459, approved by @itazap)
  • Did you write any new necessary tests?

Who can review?

@itazap @ArthurZucker

Changed files

  • src/transformers/tokenization_utils_base.py (modified, +1/-3)
  • tests/tokenization/test_tokenization_utils.py (modified, +21/-0)

Code Example

from transformers.tokenization_utils_base import import_protobuf_decode_error

try:
    raise ValueError("real error")
except import_protobuf_decode_error():
    pass
except ValueError as e:
    print("caught real:", e)
# ImportError: ... requires the protobuf library ...

---

def import_protobuf_decode_error(error_message=""):
    if is_protobuf_available():
        from google.protobuf.message import DecodeError

        return DecodeError
    return ()
RAW_BUFFERClick to expand / collapse

System Info

transformers 5.5.4 (latest release) and 5.6.0.dev0 (main).

PreTrainedTokenizerBase._from_pretrained has except import_protobuf_decode_error(): at src/transformers/tokenization_utils_base.py:1919 (line 1933 on main). The helper raises ImportError when protobuf isn't installed. The except-class expression is evaluated lazily when the try block raises, so that ImportError gets raised instead of the original exception.

protobuf isn't in install_requires (it only comes in through the sentencepiece extra), so a plain pip install transformers triggers this for any exception the tokenizer constructor raises. The RuntimeError and OSError handlers below never get a chance either, because the except evaluator itself raising terminates handler dispatch.

Who can help?

@ArthurZucker @itazap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

With protobuf not installed (pip install transformers without any extras):

from transformers.tokenization_utils_base import import_protobuf_decode_error

try:
    raise ValueError("real error")
except import_protobuf_decode_error():
    pass
except ValueError as e:
    print("caught real:", e)
# ImportError: ... requires the protobuf library ...

Suggested fix

Return () when protobuf is missing so the except matches nothing:

def import_protobuf_decode_error(error_message=""):
    if is_protobuf_available():
        from google.protobuf.message import DecodeError

        return DecodeError
    return ()

DecodeError can't be raised without protobuf imported, so () is safe here. The function isn't exported and has only the single call site.

Expected behavior

Without protobuf installed, exceptions from the tokenizer constructor should surface to the caller instead of being replaced by ImportError: ...requires the protobuf library....

extent analysis

TL;DR

Modify the import_protobuf_decode_error function to return an empty tuple when protobuf is missing, allowing the original exception to surface.

Guidance

  • Verify that the issue is indeed caused by the import_protobuf_decode_error function raising an ImportError when protobuf is not installed.
  • Apply the suggested fix by changing the import_protobuf_decode_error function to return () when protobuf is missing, as shown in the provided code snippet.
  • Test the modified function with the provided reproduction code to ensure that the original exception is caught and printed as expected.
  • Consider adding protobuf to the install_requires list or documenting the requirement for the sentencepiece extra to avoid similar issues in the future.

Example

The suggested fix can be applied by modifying the import_protobuf_decode_error function as follows:

def import_protobuf_decode_error(error_message=""):
    if is_protobuf_available():
        from google.protobuf.message import DecodeError
        return DecodeError
    return ()

Notes

This fix assumes that the is_protobuf_available function correctly checks for the presence of the protobuf library. If this function is not reliable, additional modifications may be necessary.

Recommendation

Apply the workaround by modifying the import_protobuf_decode_error function as suggested, as this allows the original exception to surface and provides a more informative error message.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Without protobuf installed, exceptions from the tokenizer constructor should surface to the caller instead of being replaced by ImportError: ...requires the protobuf library....

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix `except import_protobuf_decode_error()` hides real tokenizer errors when protobuf isn't installed [3 pull requests, 4 comments, 3 participants]