pytorch - 💡(How to fix) Fix ONNX SDPA export unconditionally inserts IsNaN/Where after Softmax for bool masks, hurting fusion and performance [5 comments, 2 participants]

pytorch2026-03-19 20:13:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#177892•Fetched 2026-04-08 01:03:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

happyme531

Participants

happyme531

justinchuby

Timeline (top)

mentioned ×13subscribed ×13labeled ×6commented ×5

The TorchScript ONNX exporter currently inserts an unconditional Where(IsNaN(attn_weight), 0, attn_weight) immediately after Softmax when exporting torch.nn.functional.scaled_dot_product_attention with a boolean attn_mask.

This was introduced by commit c859ba7114b1fcb49527e090745fa17091d1f8d5 (Make onnx export SDPA match aten behavior, PR #159973).

Current code path:

torch/onnx/_internal/torchscript_exporter/symbolic_opset14.py

Relevant logic:

attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)

Root Cause

This was introduced by commit c859ba7114b1fcb49527e090745fa17091d1f8d5 (Make onnx export SDPA match aten behavior, PR #159973).

Current code path:

torch/onnx/_internal/torchscript_exporter/symbolic_opset14.py

Relevant logic:

attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)

Code Example

attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)

RAW_BUFFERClick to expand / collapse

Summary

This was introduced by commit c859ba7114b1fcb49527e090745fa17091d1f8d5 (Make onnx export SDPA match aten behavior, PR #159973).

Current code path:

torch/onnx/_internal/torchscript_exporter/symbolic_opset14.py

Relevant logic:

attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)

Problem

This handles a correctness corner case where a boolean mask can create a fully-masked row, which then makes ONNX Softmax produce NaN due to 0/0.

However, this fix is currently applied unconditionally for the bool-mask export path, even in the common case where fully-masked rows never occur.

In practice this has real downstream cost:

It breaks backend fusion patterns that expect Softmax to feed the next op directly.
It adds extra graph ops and memory traffic in a hot path.
It hurts deployment performance on inference compilers / NPUs.

For my use case this is particularly painful in RKNN / RKNPU2 deployment pipelines, where graph cleanliness matters a lot for kernel selection and fusion.

Request

Could this behavior be made less pessimistic?

Some possible options:

Add an exporter option to disable this NaN-sanitizing pattern.
Only emit it when the exporter can prove the fully-masked-row risk exists.
Provide a compatibility mode for exact bool-mask behavior, while keeping the default graph more optimization-friendly.
Route this case through a representation that backends can more easily optimize away.

Additional context

I work on RKNPU2 deployment, so this kind of post-Softmax pattern has a noticeable impact on fusion and runtime performance for deployed models.

Hugging Face profile: https://huggingface.co/happyme531

cc @jerryzh168 @justinchuby @titaiwangms @drisspg @liangel-02 @howardzhang-cv

extent analysis

Fix Plan

To address the issue, we can add an exporter option to disable the NaN-sanitizing pattern. Here are the steps:

Add a new option disable_nan_sanitization to the torch.onnx.export function:

def export(model, args, f, opset_version, ... , disable_nan_sanitization=False):

Modify the symbolic_opset14.py file to check for this option before inserting the Where op:

if not disable_nan_sanitization:
    attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)
else:
    attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)

Update the torch.nn.functional.scaled_dot_product_attention function to pass this option to the exporter:

def scaled_dot_product_attention(q, k, v, attn_mask, ... , disable_nan_sanitization=False):
    # ...
    torch.onnx.export(model, args, f, opset_version, ... , disable_nan_sanitization=disable_nan_sanitization)

Verification

To verify that the fix worked, you can:

Export a model with the disable_nan_sanitization option set to True and check that the Where op is not inserted.
Measure the performance of the exported model on your target hardware (e.g. RKNPU2) and verify that it is improved.

Extra Tips

When using the disable_nan_sanitization option, make sure to test your model thoroughly to ensure that it does not produce NaN values in the Softmax output.
Consider adding additional logging or debugging statements to help identify potential issues with NaN values.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #optimization #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix ONNX SDPA export unconditionally inserts IsNaN/Where after Softmax for bool masks, hurting fusion and performance [5 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem

Request

Additional context

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix ONNX SDPA export unconditionally inserts IsNaN/Where after Softmax for bool masks, hurting fusion and performance [5 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem

Request

Additional context

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING