transformers - ✅(Solved) Fix `integrations/flash_attention.py` crashes with `AttributeError` on `s_aux=None` for sink-less models [4 pull requests, 1 comments, 2 participants]

Q: Expected behavior

`flash_attention_forward` unconditionally calls `s_aux.to(query.dtype)`, even though `s_aux: torch.Tensor | None = None` is optional and defaults to `None`. Models that do not have attention sinks (e.g. Gemma 4) never pass `s_aux=` from their attention `forward`, so the keyword argument stays `None` and training/inference crashes. Offending line: https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/integrations/flash_attention.py#L84 ```none File ".../transformers/integrations/flash_attention.py", line 84, in flash_attention_forward s_aux=s_aux.to(query.dtype), # FA only accepts half precision ^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'to' ``` This is the same bug that https://github.com/huggingface/transformers/pull/40434 fixed for `flash_paged.py` by adding a guard so `s_aux` is only forwarded when set.

transformers2026-04-23 00:37:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45588•Fetched 2026-04-23 07:22:56

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jamesbraza

Participants

jamesbraza

truffle-dev

Timeline (top)

cross-referenced ×4mentioned ×3subscribed ×3commented ×1

Error Message

import torch from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "google/gemma-4-E4B-it" # Any sink-less model tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, dtype=torch.bfloat16, attn_implementation="flash_attention_2", ).cuda()

inputs = tok("Hello", return_tensors="pt").to("cuda") model(**inputs) # AttributeError: 'NoneType' object has no attribute 'to'

Fix Action

Fixed

Fixed by PR: Fix AttributeError on s_aux=None in flash_attention_forward (https://github.com/huggingface/transformers/pull/45589)
Fixed by PR: fix #45588: guard s_aux against None in flash_attention_forward (https://github.com/huggingface/transformers/pull/45590)

PR fix notes

PR #45589: Fix `AttributeError` on `s_aux=None` in `flash_attention_forward`

Repository: huggingface/transformers
Author: jamesbraza
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45589

Description (problem / solution / changelog)

Fixes https://github.com/huggingface/transformers/issues/45588

@ArthurZucker @yonigozlan @molbap

Changed files

src/transformers/integrations/flash_attention.py (modified, +5/-1)

PR #45590: fix #45588: guard s_aux against None in flash_attention_forward

Repository: huggingface/transformers
Author: ghost
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/45590

Description (problem / solution / changelog)

Fix for #45588

Bug

flash_attention_forward unconditionally calls s_aux.to(query.dtype), but s_aux defaults to None and sink-less models (e.g. Gemma 4) never pass s_aux, causing:

AttributeError: 'NoneType' object has no attribute 'to'

Fix

Add a guard to only convert s_aux when it is not None:

# Before
s_aux=s_aux.to(query.dtype)

# After
s_aux=s_aux.to(query.dtype) if s_aux is not None else None

This pattern was already used in flash_paged.py (see PR #40434).

Testing

Python syntax check passed
Code follows existing pattern in codebase

Notes

Huggingface transformers v5.6.0
Affects sink-less models using flash_attention_2

Automated high-quality fix

Changed files

src/transformers/integrations/flash_attention.py (modified, +1/-1)

PR #2813: limit transformer version, until they fixed issues/45588

Repository: ModelCloud/GPTQModel
Author: CSY-ModelCloud
State: open | merged: False
Link: https://github.com/ModelCloud/GPTQModel/pull/2813

Description (problem / solution / changelog)

huggingface/transformers/issues/45588

Changed files

requirements.txt (modified, +1/-1)

PR #123: limit transformer version, until they fixed issues/45588

Repository: ModelCloud/Evalution
Author: CSY-ModelCloud
State: open | merged: False
Link: https://github.com/ModelCloud/Evalution/pull/123

Description (problem / solution / changelog)

huggingface/transformers/issues/45588

Changed files

pyproject.toml (modified, +1/-1)

Code Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "google/gemma-4-E4B-it"  # Any sink-less model
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
).cuda()

inputs = tok("Hello", return_tensors="pt").to("cuda")
model(**inputs)  # AttributeError: 'NoneType' object has no attribute 'to'

---

File ".../transformers/integrations/flash_attention.py", line 84, in flash_attention_forward
    s_aux=s_aux.to(query.dtype),  # FA only accepts half precision
          ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to'

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.6.0
Platform: Linux-6.8.0-1043-nvidia-x86_64-with-glibc2.35
Python version: 3.12.13
Huggingface_hub version: 1.11.0
Safetensors version: 0.7.0
Accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.10.0+cu129 (CUDA)
Using distributed or parallel set-up in script?: no
Using GPU in script?: yes
GPU type: NVIDIA H100 80GB HBM3

Who can help?

@ArthurZucker @yonigozlan @molbap

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "google/gemma-4-E4B-it"  # Any sink-less model
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
).cuda()

inputs = tok("Hello", return_tensors="pt").to("cuda")
model(**inputs)  # AttributeError: 'NoneType' object has no attribute 'to'

Expected behavior

flash_attention_forward unconditionally calls s_aux.to(query.dtype), even though s_aux: torch.Tensor | None = None is optional and defaults to None. Models that do not have attention sinks (e.g. Gemma 4) never pass s_aux= from their attention forward, so the keyword argument stays None and training/inference crashes.

Offending line: https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/integrations/flash_attention.py#L84

File ".../transformers/integrations/flash_attention.py", line 84, in flash_attention_forward
    s_aux=s_aux.to(query.dtype),  # FA only accepts half precision
          ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to'

This is the same bug that https://github.com/huggingface/transformers/pull/40434 fixed for flash_paged.py by adding a guard so s_aux is only forwarded when set.

extent analysis

TL;DR

The most likely fix is to add a guard to check if s_aux is not None before calling to on it in the flash_attention_forward function.

Guidance

The error occurs because s_aux is None and the code tries to call to on it, which is not allowed.
To fix this, a conditional check should be added to ensure s_aux is not None before attempting to call to on it.
The fix should be applied to the flash_attention_forward function in the flash_attention.py file.
A similar fix was already applied to flash_paged.py in pull request #40434, which can be used as a reference.

Example

if s_aux is not None:
    s_aux = s_aux.to(query.dtype)

Notes

This fix assumes that s_aux can be safely ignored when it is None, which is the case for models without attention sinks.
The fix should be applied to the transformers library, specifically to the flash_attention.py file.

Recommendation

Apply workaround: add a conditional check to ensure s_aux is not None before calling to on it, as shown in the example code. This is because the issue is specific to the flash_attention_forward function and can be fixed with a simple guard clause.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Offending line: https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/integrations/flash_attention.py#L84

File ".../transformers/integrations/flash_attention.py", line 84, in flash_attention_forward
    s_aux=s_aux.to(query.dtype),  # FA only accepts half precision
          ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to'

This is the same bug that https://github.com/huggingface/transformers/pull/40434 fixed for flash_paged.py by adding a guard so s_aux is only forwarded when set.

#cache error #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix `integrations/flash_attention.py` crashes with `AttributeError` on `s_aux=None` for sink-less models [4 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #45589: Fix AttributeError on s_aux=None in flash_attention_forward

Description (problem / solution / changelog)

Changed files

PR #45590: fix #45588: guard s_aux against None in flash_attention_forward

Description (problem / solution / changelog)

Fix for #45588

Bug

Fix

Testing

Notes

Changed files

PR #2813: limit transformer version, until they fixed issues/45588

Description (problem / solution / changelog)

Changed files

PR #123: limit transformer version, until they fixed issues/45588

Description (problem / solution / changelog)

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

PR #45589: Fix `AttributeError` on `s_aux=None` in `flash_attention_forward`