transformers - ✅(Solved) Fix Transformers Qwen3.5 had a bug when set output_hidden_states=True [1 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44849Fetched 2026-04-08 01:01:49
View on GitHub
Comments
4
Participants
3
Timeline
9
Reactions
0
Timeline (top)
commented ×4subscribed ×2cross-referenced ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #44922: fix: pop output_* flags from kwargs in capture_outputs to prevent submodule leakage

Description (problem / solution / changelog)

What does this PR do?

Fixes #44849.

When output_hidden_states=True (or output_attentions=True) is passed to model.generate(), the @capture_outputs decorator reads the flag value but leaves it in **kwargs. These flags then propagate through **kwargs chains deep into sub-models — specifically, into vision encoder blocks and attention functions that don't expect them.

For the Qwen3.5 (and Qwen VL family) this causes garbled generation when output_hidden_states=True is set: the flag reaches Qwen3_5VisionBlock.attn via Qwen3_5Model.get_image_features(**kwargs)self.visual(**kwargs)blk(**kwargs)self.attn(**kwargs), corrupting intermediate attention tensors and causing the model to generate repetitive image-pad tokens instead of meaningful text.

Root cause

In capture_outputs (in output_capturing.py), the decorator uses kwargs.get(...) to read the output flags — but it does not remove them from kwargs. The underlying func(self, *args, **kwargs) call therefore still sees output_hidden_states=True, which then leaks into every submodule called with **kwargs.

Fix

After reading the values for all capturable flags, immediately pop them from kwargs:

for k in capturable_flags:
    kwargs.pop(f"output_{k}", None)
if "cross_attentions" in capturable_flags or "mask_decoder_attentions" in capturable_flags:
    kwargs.pop("output_attentions", None)

Since @capture_outputs already captures the requested outputs through forward hooks, the underlying forward function (and all modules it calls) does not need to receive these flags. This pop has no effect on output correctness but prevents any downstream damage.

The fix applies to all models using @capture_outputs, not just Qwen3.5.

Changed files

  • src/transformers/utils/output_capturing.py (modified, +10/-0)

Code Example

><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|>请详细描述这张图片的内容。<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n这张!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!']

---

on_end|>请详细描述这张图片的内容。<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n这张图片是一张包含表格的截图,内容是一份关于不同模型在多个任务上表现的数据对比表。\n\n**整体布局:**\n图片的主体是一个表格,表格的标题为“Model”,列出了多个模型名称,以及它们在不同任务上的平均长度(Avg. Length)、任务1、任务2、任务3、任务4和任务5的得分百分比。表格下方还有一行说明文字。\n\n**表格内容详情:**\n表格共有8行数据,对应8个不同的模型。\n\n- **第一行:**\n  - **Model:** `PI0*`\n  - **Avg. Length:** 2.954\n  - **Task 1:** 84.8%\n  - **Task 2:** 70.4%\n  - **Task 3:** 55.9%\n  - **Task 4:** 46.6%\n  - **Task 5:** 37.7%\n\n- **第二行:**\n  - **Model:** `PI0.5*`\n  - **Avg. Length:** 3.885\n  - **Task 1:** 92.5%\n  - **Task 2:** 84.0%\n  - **Task 3:** 76.6%\n  - **Task 4:** 71.0%\n  - **Task 5:** 64.4%\n\n- **第三行:**\n  - **Model:** `qwenpi (qwen2.5-vl-3B-instruct-action)`\n  - **Avg. Length:** 3.5']
RAW_BUFFERClick to expand / collapse

System Info

Version: 5.2.0

in qwen3.5

outputs = model_wrapper.generate(**inputs, output_hidden_states=True)

outpus something like this:

><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|>请详细描述这张图片的内容。<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n这张!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!']

ignore the output_hideen_states params, normal

on_end|>请详细描述这张图片的内容。<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n这张图片是一张包含表格的截图,内容是一份关于不同模型在多个任务上表现的数据对比表。\n\n**整体布局:**\n图片的主体是一个表格,表格的标题为“Model”,列出了多个模型名称,以及它们在不同任务上的平均长度(Avg. Length)、任务1、任务2、任务3、任务4和任务5的得分百分比。表格下方还有一行说明文字。\n\n**表格内容详情:**\n表格共有8行数据,对应8个不同的模型。\n\n- **第一行:**\n  - **Model:** `PI0*`\n  - **Avg. Length:** 2.954\n  - **Task 1:** 84.8%\n  - **Task 2:** 70.4%\n  - **Task 3:** 55.9%\n  - **Task 4:** 46.6%\n  - **Task 5:** 37.7%\n\n- **第二行:**\n  - **Model:** `PI0.5*`\n  - **Avg. Length:** 3.885\n  - **Task 1:** 92.5%\n  - **Task 2:** 84.0%\n  - **Task 3:** 76.6%\n  - **Task 4:** 71.0%\n  - **Task 5:** 64.4%\n\n- **第三行:**\n  - **Model:** `qwenpi (qwen2.5-vl-3B-instruct-action)`\n  - **Avg. Length:** 3.5']

Who can help?

w

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

f

Expected behavior

rg

extent analysis

Fix Plan

The issue seems to be related to the model generating unnecessary padding tokens. To fix this, we can try the following steps:

  • Update the generate method to remove padding tokens from the output.
  • Use the max_length parameter to limit the length of the generated output.

Here's an example code snippet:

outputs = model_wrapper.generate(**inputs, output_hidden_states=True, max_length=512)
output_text = outputs[0].strip().replace("<|image_pad|>", "").replace("<|vision_end|>", "").replace("<|im_end|>", "").replace("<|im_start|>", "")

Alternatively, you can also use the truncation parameter to truncate the output at a specified length:

outputs = model_wrapper.generate(**inputs, output_hidden_states=True, max_length=512, truncation=True)

Verification

To verify that the fix worked, you can check the output of the generate method to see if the padding tokens have been removed. You can also check the length of the output to ensure it is within the specified limit.

print(output_text)
print(len(output_text))

Extra Tips

Make sure to adjust the max_length parameter according to your specific use case to avoid truncating important information. Additionally, you can also experiment with different truncation strategies, such as truncating at a specific token or character position.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

rg

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING