transformers - ✅(Solved) Fix Qwen3.5-35B crashes with transformers chat [1 pull requests, 1 comments, 2 participants]

transformers2026-04-10 15:39:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45362•Fetched 2026-04-11 06:12:14

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Sector14

Participants

Sector14

sharziki

Timeline (top)

commented ×1cross-referenced ×1labeled ×1referenced ×1

Error Message

[snip] File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/chat_completion.py", line 174, in _streaming queue, streamer = gen_manager.generate_streaming(model, processor, inputs, gen_config, request_id=request_id) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/utils.py", line 565, in generate_streaming streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True) ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

Fix Action

Fixed

Fixed by PR: fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation (https://github.com/huggingface/transformers/pull/45368)

PR fix notes

PR #45368: fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation

Repository: huggingface/transformers
Author: sharziki
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45368

Description (problem / solution / changelog)

Summary

Fixes #45362 — transformers chat crashes with AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer' when streaming responses from Qwen models.

Root cause: GenerateManager.generate_streaming() and CBGenerateManager.generate_streaming() access processor._tokenizer to get the Rust tokenizer backend. This works for PreTrainedTokenizerFast (which stores the Rust backend at ._tokenizer), but ProcessorMixin subclasses like Qwen3VLProcessor expose the fast tokenizer at the public .tokenizer attribute instead.

Fix: Use getattr(processor, "tokenizer", processor)._tokenizer to first resolve the fast tokenizer (which is processor.tokenizer for ProcessorMixin, or processor itself for PreTrainedTokenizerFast), then access ._tokenizer for the Rust backend.

Two locations updated:

GenerateManager.generate_streaming() (line 565)
CBGenerateManager.generate_streaming() (line 664)

Coordination

Issue discussion: https://github.com/huggingface/transformers/issues/45362#issuecomment-4227898143
No existing open PRs for this issue.
AI assistance (Claude Code) was used. All changes reviewed and validated by the submitting human.

Test plan

Verify transformers chat Qwen/Qwen3.5-35B-A3B no longer crashes on first prompt
Verify streaming works correctly with non-processor models (e.g. text-only models)
ruff check src/transformers/cli/serving/utils.py passes

🤖 Generated with Claude Code

Changed files

src/transformers/cli/serving/utils.py (modified, +6/-2)

Code Example

[snip]
File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/chat_completion.py", line 174, in _streaming
    queue, streamer = gen_manager.generate_streaming(model, processor, inputs, gen_config, request_id=request_id)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/utils.py", line 565, in generate_streaming
    streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)
                              ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

RAW_BUFFERClick to expand / collapse

System Info

Using "transformers chat" with Qwen3.5-35B the moment a prompt is sent the server errors with an AttributeError

[snip]
File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/chat_completion.py", line 174, in _streaming
    queue, streamer = gen_manager.generate_streaming(model, processor, inputs, gen_config, request_id=request_id)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/venvs/rocm6.4/lib64/python3.12/site-packages/transformers/cli/serving/utils.py", line 565, in generate_streaming
    streamer = DirectStreamer(processor._tokenizer, loop, queue, skip_special_tokens=True)
                              ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer'. Did you mean: 'tokenizer'?

I'm running transformers 5.5.0 from pip along with the rocm6.4 version of pytorch (2.8.0+rocm6.4.4.gitc1404424). If you want a full pip version list let me know.

I've tried a couple of other random text models with transformers chat without issue, just Qwen so far that appears to be an issue.

Qwen's readme mentions transformers chat as a possible way to use it, so I'm assuming this used to work? Although they were using 4.57 dev version at the time the readme was written.

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

transformers serve

transformers chat Qwen/Qwen3.5-35B-A3B

Once loaded, type any text at the prompt, press enter and server will error.

Expected behavior

Model response streamed back.

extent analysis

TL;DR

The most likely fix is to update the transformers library to a version that is compatible with the Qwen3.5-35B model, as the error suggests an attribute mismatch.

Guidance

Verify that the Qwen3.5-35B model is compatible with the current version of transformers (5.5.0) by checking the model's documentation or release notes.
Check if there are any known issues or fixes related to the AttributeError in the transformers library's issue tracker or documentation.
Consider downgrading the transformers library to a version that is known to work with the Qwen3.5-35B model, such as the 4.57 dev version mentioned in the model's readme.
Test the transformers chat functionality with other models to see if the issue is specific to the Qwen3.5-35B model.

Example

No code snippet is provided as the issue seems to be related to a compatibility problem between the transformers library and the Qwen3.5-35B model.

Notes

The issue may be related to a change in the transformers library's API or a bug in the Qwen3.5-35B model's implementation. Further investigation is needed to determine the root cause of the issue.

Recommendation

Apply workaround: Try downgrading the transformers library to a version that is known to work with the Qwen3.5-35B model, such as the 4.57 dev version, to see if it resolves the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Model response streamed back.

#authentication issue #prompt issue #agent setup #task chaining #parallel task

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix Qwen3.5-35B crashes with transformers chat [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #45368: fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation

Description (problem / solution / changelog)

Summary

Coordination

Test plan

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING