transformers - 💡(How to fix) Fix transformers serve crashes with AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer' [4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45406Fetched 2026-04-15 06:19:41
View on GitHub
Comments
4
Participants
2
Timeline
8
Reactions
0
Author
Participants
Timeline (top)
commented ×4closed ×1labeled ×1mentioned ×1

Error Message

File ".../transformers/cli/serving/utils.py", line 565, in generate_streaming streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True) ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

Code Example

File ".../transformers/cli/serving/utils.py", line 565, in generate_streaming
    streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)
                              ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

---

# Before
streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)

# After
streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True)

---

# Before
streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue)

# After
streamer = CBStreamer(self._cb, request_id, processor.tokenizer._tokenizer, loop, text_queue)
RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.5.3 Python version: 3.12.3 PyTorch version: 2.11.0+cu130 Platform: Linux (Ubuntu 24.04)

Who can help?

@ArthurZucker ? idk I will open an PR once i find time. I "fixed" it locally for now.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

pip install transformers[serving] transformers serve TrevorJS/gemma-4-31B-it-uncensored Then send any chat completion request: bashcurl http://localhost:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{"model": "TrevorJS/gemma-4-31B-it-uncensored", "messages": [{"role": "user", "content": "Hello"}]}'

Full Traceback

File ".../transformers/cli/serving/utils.py", line 565, in generate_streaming
    streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)
                              ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

The same error also occurs on line 664 for the continuous batching code path: streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue)

Expected behavior

Expected Behavior Chat completions should work without error. DirectStreamer and CBStreamer require the raw Rust tokenizers.Tokenizer object, which is available at processor.tokenizer._tokenizer for Gemma4Processor.

Actual Behavior Every request crashes with AttributeError immediately after the model loads.

Fix Two lines in src/transformers/cli/serving/utils.py need updating: Line 565 (standard streaming path):

# Before
streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)

# After
streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True)

Line 664 (continuous batching path):

# Before
streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue)

# After
streamer = CBStreamer(self._cb, request_id, processor.tokenizer._tokenizer, loop, text_queue)

Alternatively, a more robust fix would be to make both call sites handle both cases: pythonrust_tokenizer = getattr(processor, "_tokenizer", None) or processor.tokenizer._tokenizer This would be defensive against other processor types that may also use the public .tokenizer attribute.

Notes

Affects all Gemma 4 models (google/gemma-4-*) since Gemma4Processor uses the public .tokenizer attribute rather than ._tokenizer Gemma 4 support was added in v5.5.0 (#45192) and transformers serve was not tested against it No existing GitHub issues or public reports found for this specific error as of April 2026 — this appears to be a gap in integration testing between the new serving CLI and Gemma4's processor design

extent analysis

TL;DR

Update transformers/cli/serving/utils.py to access the tokenizer attribute correctly for Gemma4Processor by changing processor._tokenizer to processor.tokenizer._tokenizer.

Guidance

  • Verify the issue by checking if the error occurs when using other models that do not use the Gemma4Processor, to confirm it's specific to Gemma 4 models.
  • Apply the suggested fix by updating lines 565 and 664 in transformers/cli/serving/utils.py as described in the issue.
  • Consider implementing a more robust fix that handles both cases, using getattr to access the tokenizer attribute.
  • Test the fix with different Gemma 4 models to ensure the issue is resolved across all affected models.

Example

# Before
streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)

# After
streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True)

Notes

The fix assumes that the Gemma4Processor uses the public tokenizer attribute, and that updating the utils.py file will resolve the issue. However, this may not be the case if other processor types use the _tokenizer attribute.

Recommendation

Apply the workaround by updating the utils.py file, as the issue appears to be a gap in integration testing between the new serving CLI and Gemma4's processor design, and no fixed version is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Expected Behavior Chat completions should work without error. DirectStreamer and CBStreamer require the raw Rust tokenizers.Tokenizer object, which is available at processor.tokenizer._tokenizer for Gemma4Processor.

Actual Behavior Every request crashes with AttributeError immediately after the model loads.

Fix Two lines in src/transformers/cli/serving/utils.py need updating: Line 565 (standard streaming path):

# Before
streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)

# After
streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True)

Line 664 (continuous batching path):

# Before
streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue)

# After
streamer = CBStreamer(self._cb, request_id, processor.tokenizer._tokenizer, loop, text_queue)

Alternatively, a more robust fix would be to make both call sites handle both cases: pythonrust_tokenizer = getattr(processor, "_tokenizer", None) or processor.tokenizer._tokenizer This would be defensive against other processor types that may also use the public .tokenizer attribute.

Notes

Affects all Gemma 4 models (google/gemma-4-*) since Gemma4Processor uses the public .tokenizer attribute rather than ._tokenizer Gemma 4 support was added in v5.5.0 (#45192) and transformers serve was not tested against it No existing GitHub issues or public reports found for this specific error as of April 2026 — this appears to be a gap in integration testing between the new serving CLI and Gemma4's processor design

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING