transformers - 💡(How to fix) Fix transformers serve crashes with AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer' [4 comments, 2 participants]

Q: Expected behavior

**Expected Behavior** Chat completions should work without error. DirectStreamer and CBStreamer require the raw Rust tokenizers.Tokenizer object, which is available at processor.tokenizer._tokenizer for Gemma4Processor. **Actual Behavior** Every request crashes with AttributeError immediately after the model loads. **Fix** Two lines in src/transformers/cli/serving/utils.py need updating: Line 565 (standard streaming path): ``` # Before streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True) # After streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True) ``` Line 664 (continuous batching path): ``` # Before streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue) # After streamer = CBStreamer(self._cb, request_id, processor.tokenizer._tokenizer, loop, text_queue) ``` Alternatively, a more robust fix would be to make both call sites handle both cases: pythonrust_tokenizer = getattr(processor, "_tokenizer", None) or processor.tokenizer._tokenizer This would be defensive against other processor types that may also use the public .tokenizer attribute. **Notes** Affects all Gemma 4 models (google/gemma-4-*) since Gemma4Processor uses the public .tokenizer attribute rather than ._tokenizer Gemma 4 support was added in v5.5.0 (#45192) and transformers serve was not tested against it No existing GitHub issues or public reports found for this specific error as of April 2026 — this appears to be a gap in integration testing between the new serving CLI and Gemma4's processor design

transformers2026-04-13 13:57:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45406•Fetched 2026-04-15 06:19:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

asdat3

Participants

asdat3

zucchini-nlp

Timeline (top)

commented ×4closed ×1labeled ×1mentioned ×1

Error Message

File ".../transformers/cli/serving/utils.py", line 565, in generate_streaming streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True) ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

Code Example

File ".../transformers/cli/serving/utils.py", line 565, in generate_streaming
    streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)
                              ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

---

# Before
streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)

# After
streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True)

---

# Before
streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue)

# After
streamer = CBStreamer(self._cb, request_id, processor.tokenizer._tokenizer, loop, text_queue)

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.5.3 Python version: 3.12.3 PyTorch version: 2.11.0+cu130 Platform: Linux (Ubuntu 24.04)

Who can help?

@ArthurZucker ? idk I will open an PR once i find time. I "fixed" it locally for now.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

pip install transformers[serving] transformers serve TrevorJS/gemma-4-31B-it-uncensored Then send any chat completion request: bashcurl http://localhost:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{"model": "TrevorJS/gemma-4-31B-it-uncensored", "messages": [{"role": "user", "content": "Hello"}]}'

Full Traceback

File ".../transformers/cli/serving/utils.py", line 565, in generate_streaming
    streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)
                              ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

The same error also occurs on line 664 for the continuous batching code path: streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue)

Expected behavior

Expected Behavior Chat completions should work without error. DirectStreamer and CBStreamer require the raw Rust tokenizers.Tokenizer object, which is available at processor.tokenizer._tokenizer for Gemma4Processor.

Actual Behavior Every request crashes with AttributeError immediately after the model loads.

Fix Two lines in src/transformers/cli/serving/utils.py need updating: Line 565 (standard streaming path):

# Before
streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)

# After
streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True)

Line 664 (continuous batching path):

# Before
streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue)

# After
streamer = CBStreamer(self._cb, request_id, processor.tokenizer._tokenizer, loop, text_queue)

Alternatively, a more robust fix would be to make both call sites handle both cases: pythonrust_tokenizer = getattr(processor, "_tokenizer", None) or processor.tokenizer._tokenizer This would be defensive against other processor types that may also use the public .tokenizer attribute.

Notes

Affects all Gemma 4 models (google/gemma-4-*) since Gemma4Processor uses the public .tokenizer attribute rather than ._tokenizer Gemma 4 support was added in v5.5.0 (#45192) and transformers serve was not tested against it No existing GitHub issues or public reports found for this specific error as of April 2026 — this appears to be a gap in integration testing between the new serving CLI and Gemma4's processor design

extent analysis

TL;DR

Update transformers/cli/serving/utils.py to access the tokenizer attribute correctly for Gemma4Processor by changing processor._tokenizer to processor.tokenizer._tokenizer.

Guidance

Verify the issue by checking if the error occurs when using other models that do not use the Gemma4Processor, to confirm it's specific to Gemma 4 models.
Apply the suggested fix by updating lines 565 and 664 in transformers/cli/serving/utils.py as described in the issue.
Consider implementing a more robust fix that handles both cases, using getattr to access the tokenizer attribute.
Test the fix with different Gemma 4 models to ensure the issue is resolved across all affected models.

Example

# Before
streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)

# After
streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True)

Notes

The fix assumes that the Gemma4Processor uses the public tokenizer attribute, and that updating the utils.py file will resolve the issue. However, this may not be the case if other processor types use the _tokenizer attribute.

Recommendation

Apply the workaround by updating the utils.py file, as the issue appears to be a gap in integration testing between the new serving CLI and Gemma4's processor design, and no fixed version is available.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Actual Behavior Every request crashes with AttributeError immediately after the model loads.

Fix Two lines in src/transformers/cli/serving/utils.py need updating: Line 565 (standard streaming path):

# Before
streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)

# After
streamer = DirectStreamer(processor.tokenizer._tokenizer, loop, queue, skip_special_tokens=True)

Line 664 (continuous batching path):

# Before
streamer = CBStreamer(self._cb, request_id, processor._tokenizer, loop, text_queue)

# After
streamer = CBStreamer(self._cb, request_id, processor.tokenizer._tokenizer, loop, text_queue)

Notes

#API rate limit #retriever error #indexing error #inference speed #output truncation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix transformers serve crashes with AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer' [4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix transformers serve crashes with AttributeError: 'Gemma4Processor' object has no attribute '_tokenizer' [4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING