pytorch - 💡(How to fix) Fix FlexAttention: extend AuxRequest to return min_scores and key indices for row-wise max/min [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#176837Fetched 2026-04-08 00:24:12
View on GitHub
Comments
0
Participants
1
Timeline
20
Reactions
10
Author
Participants
Timeline (top)
mentioned ×8subscribed ×8labeled ×4

Root Cause

FlexAttention is a great API for prototyping new attention variants, especially for researchers who are not familiar with CUDA or Triton. Because of that, I’d like to extend FlexAttention’s AuxOutput API so users can optionally request more row-wise information from the attention score reduction. Since flex_attention is still a prototype feature in PyTorch, this seems like a good time to add this functionality.

Code Example

class AuxRequest(NamedTuple):
    lse: bool = False
    max_scores: bool = False
    min_scores: bool = False
    max_indices: bool = False
    min_indices: bool = False

class AuxOutput(NamedTuple):
    lse: Tensor | None = None
    max_scores: Tensor | None = None
    min_scores: Tensor | None = None
    max_indices: Tensor | None = None
    min_indices: Tensor | None = None
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

FlexAttention is a great API for prototyping new attention variants, especially for researchers who are not familiar with CUDA or Triton. Because of that, I’d like to extend FlexAttention’s AuxOutput API so users can optionally request more row-wise information from the attention score reduction. Since flex_attention is still a prototype feature in PyTorch, this seems like a good time to add this functionality.

Currently, AuxRequest supports returning lse and max_scores. I’d like to also make it possible to request:

  • min_scores
  • key index for max scores (e.g. max_indices)
  • key index for min scores (e.g. min_indices)

There are many use cases in sparse attention and, more generally, KV cache management where access to row wise maximum and minimum QK dot products and their positions would be useful.

Concretely I image the API to look something like this

class AuxRequest(NamedTuple):
    lse: bool = False
    max_scores: bool = False
    min_scores: bool = False
    max_indices: bool = False
    min_indices: bool = False

class AuxOutput(NamedTuple):
    lse: Tensor | None = None
    max_scores: Tensor | None = None
    min_scores: Tensor | None = None
    max_indices: Tensor | None = None
    min_indices: Tensor | None = None

Alternatives

No response

Additional context

No response

cc @chauhang @penguinwu @Chillee @drisspg @yanboliang @BoyuanFeng @liangel-02 @howardzhang-cv

extent analysis

Fix Plan

Update AuxRequest and AuxOutput Classes

To add the new functionality, we need to update the AuxRequest and AuxOutput classes to include the new fields.

from typing import NamedTuple, Optional

class AuxRequest(NamedTuple):
    lse: bool = False
    max_scores: bool = False
    min_scores: bool = False
    max_indices: bool = False
    min_indices: bool = False

class AuxOutput(NamedTuple):
    lse: Optional[Tensor] = None
    max_scores: Optional[Tensor] = None
    min_scores: Optional[Tensor] = None
    max_indices: Optional[Tensor] = None
    min_indices: Optional[Tensor] = None

Update FlexAttention API

We need to update the FlexAttention API to return the new fields in the AuxOutput class.

def forward(self, ...):
    # ...
    aux_output = AuxOutput(
        lse=self.lse,
        max_scores=self.max_scores,
        min_scores=self.min_scores,
        max_indices=self.max_indices,
        min_indices=self.min_indices
    )
    return aux_output

Update Usage Example

We need to update the usage example to include the new fields.

aux_request = AuxRequest(
    lse=True,
    max_scores=True,
    min_scores=True,
    max_indices=True,
    min_indices=True
)

aux_output = flex_attention(..., aux_request=aux_request)
print(aux_output)

Verification

To verify that the fix worked, we can check that the AuxOutput class contains the new fields and that they are populated correctly.

print(aux_output.lse)
print(aux_output.max_scores)
print(aux_output.min_scores)
print(aux_output.max_indices)
print(aux_output.min_indices)

Extra Tips

  • Make sure to update the

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING