transformers - ✅(Solved) Fix Qwen3.5-35B crashes with transformers chat [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45362Fetched 2026-04-11 06:12:14
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
commented ×1cross-referenced ×1labeled ×1referenced ×1

Error Message

[snip] File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/chat_completion.py", line 174, in _streaming queue, streamer = gen_manager.generate_streaming(model, processor, inputs, gen_config, request_id=request_id) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/utils.py", line 565, in generate_streaming streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True) ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

Fix Action

Fixed

PR fix notes

PR #45368: fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation

Description (problem / solution / changelog)

Summary

Fixes #45362 — transformers chat crashes with AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer' when streaming responses from Qwen models.

Root cause: GenerateManager.generate_streaming() and CBGenerateManager.generate_streaming() access processor._tokenizer to get the Rust tokenizer backend. This works for PreTrainedTokenizerFast (which stores the Rust backend at ._tokenizer), but ProcessorMixin subclasses like Qwen3VLProcessor expose the fast tokenizer at the public .tokenizer attribute instead.

Fix: Use getattr(processor, "tokenizer", processor)._tokenizer to first resolve the fast tokenizer (which is processor.tokenizer for ProcessorMixin, or processor itself for PreTrainedTokenizerFast), then access ._tokenizer for the Rust backend.

Two locations updated:

  • GenerateManager.generate_streaming() (line 565)
  • CBGenerateManager.generate_streaming() (line 664)

Coordination

Test plan

  • Verify transformers chat Qwen/Qwen3.5-35B-A3B no longer crashes on first prompt
  • Verify streaming works correctly with non-processor models (e.g. text-only models)
  • ruff check src/transformers/cli/serving/utils.py passes

🤖 Generated with Claude Code

Changed files

  • src/transformers/cli/serving/utils.py (modified, +6/-2)

Code Example

[snip]
File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/chat_completion.py", line 174, in _streaming
    queue, streamer = gen_manager.generate_streaming(model, processor, inputs, gen_config, request_id=request_id)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/utils.py", line 565, in generate_streaming
    streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)
                              ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?
RAW_BUFFERClick to expand / collapse

System Info

Using "transformers chat" with Qwen3.5-35B the moment a prompt is sent the server errors with an AttributeError

[snip]
File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/chat_completion.py", line 174, in _streaming
    queue, streamer = gen_manager.generate_streaming(model, processor, inputs, gen_config, request_id=request_id)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/utils.py", line 565, in generate_streaming
    streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)
                              ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

I'm running transformers 5.5.0 from pip along with the rocm6.4 version of pytorch (2.8.0+rocm6.4.4.gitc1404424). If you want a full pip version list let me know.

I've tried a couple of other random text models with transformers chat without issue, just Qwen so far that appears to be an issue.

Qwen's readme mentions transformers chat as a possible way to use it, so I'm assuming this used to work? Although they were using 4.57 dev version at the time the readme was written.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

transformers serve

transformers chat Qwen/Qwen3.5-35B-A3B

Once loaded, type any text at the prompt, press enter and server will error.

Expected behavior

Model response streamed back.

extent analysis

TL;DR

The most likely fix is to update the transformers library to a version that is compatible with the Qwen3.5-35B model, as the error suggests an attribute mismatch.

Guidance

  • Verify that the Qwen3.5-35B model is compatible with the current version of transformers (5.5.0) by checking the model's documentation or release notes.
  • Check if there are any known issues or fixes related to the AttributeError in the transformers library's issue tracker or documentation.
  • Consider downgrading the transformers library to a version that is known to work with the Qwen3.5-35B model, such as the 4.57 dev version mentioned in the model's readme.
  • Test the transformers chat functionality with other models to see if the issue is specific to the Qwen3.5-35B model.

Example

No code snippet is provided as the issue seems to be related to a compatibility problem between the transformers library and the Qwen3.5-35B model.

Notes

The issue may be related to a change in the transformers library's API or a bug in the Qwen3.5-35B model's implementation. Further investigation is needed to determine the root cause of the issue.

Recommendation

Apply workaround: Try downgrading the transformers library to a version that is known to work with the Qwen3.5-35B model, such as the 4.57 dev version, to see if it resolves the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Model response streamed back.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING