pytorch - 💡(How to fix) Fix ONNX SDPA export unconditionally inserts IsNaN/Where after Softmax for bool masks, hurting fusion and performance [5 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#177892Fetched 2026-04-08 01:03:04
View on GitHub
Comments
5
Participants
2
Timeline
41
Reactions
0
Timeline (top)
mentioned ×13subscribed ×13labeled ×6commented ×5

The TorchScript ONNX exporter currently inserts an unconditional Where(IsNaN(attn_weight), 0, attn_weight) immediately after Softmax when exporting torch.nn.functional.scaled_dot_product_attention with a boolean attn_mask.

This was introduced by commit c859ba7114b1fcb49527e090745fa17091d1f8d5 (Make onnx export SDPA match aten behavior, PR #159973).

Current code path:

  • torch/onnx/_internal/torchscript_exporter/symbolic_opset14.py

Relevant logic:

attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)

Root Cause

The TorchScript ONNX exporter currently inserts an unconditional Where(IsNaN(attn_weight), 0, attn_weight) immediately after Softmax when exporting torch.nn.functional.scaled_dot_product_attention with a boolean attn_mask.

This was introduced by commit c859ba7114b1fcb49527e090745fa17091d1f8d5 (Make onnx export SDPA match aten behavior, PR #159973).

Current code path:

  • torch/onnx/_internal/torchscript_exporter/symbolic_opset14.py

Relevant logic:

attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)

Code Example

attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)
RAW_BUFFERClick to expand / collapse

Summary

The TorchScript ONNX exporter currently inserts an unconditional Where(IsNaN(attn_weight), 0, attn_weight) immediately after Softmax when exporting torch.nn.functional.scaled_dot_product_attention with a boolean attn_mask.

This was introduced by commit c859ba7114b1fcb49527e090745fa17091d1f8d5 (Make onnx export SDPA match aten behavior, PR #159973).

Current code path:

  • torch/onnx/_internal/torchscript_exporter/symbolic_opset14.py

Relevant logic:

attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)

Problem

This handles a correctness corner case where a boolean mask can create a fully-masked row, which then makes ONNX Softmax produce NaN due to 0/0.

However, this fix is currently applied unconditionally for the bool-mask export path, even in the common case where fully-masked rows never occur.

In practice this has real downstream cost:

  • It breaks backend fusion patterns that expect Softmax to feed the next op directly.
  • It adds extra graph ops and memory traffic in a hot path.
  • It hurts deployment performance on inference compilers / NPUs.

For my use case this is particularly painful in RKNN / RKNPU2 deployment pipelines, where graph cleanliness matters a lot for kernel selection and fusion.

Request

Could this behavior be made less pessimistic?

Some possible options:

  • Add an exporter option to disable this NaN-sanitizing pattern.
  • Only emit it when the exporter can prove the fully-masked-row risk exists.
  • Provide a compatibility mode for exact bool-mask behavior, while keeping the default graph more optimization-friendly.
  • Route this case through a representation that backends can more easily optimize away.

Additional context

I work on RKNPU2 deployment, so this kind of post-Softmax pattern has a noticeable impact on fusion and runtime performance for deployed models.

Hugging Face profile: https://huggingface.co/happyme531

cc @jerryzh168 @justinchuby @titaiwangms @drisspg @liangel-02 @howardzhang-cv

extent analysis

Fix Plan

To address the issue, we can add an exporter option to disable the NaN-sanitizing pattern. Here are the steps:

  • Add a new option disable_nan_sanitization to the torch.onnx.export function:
def export(model, args, f, opset_version, ... , disable_nan_sanitization=False):
  • Modify the symbolic_opset14.py file to check for this option before inserting the Where op:
if not disable_nan_sanitization:
    attn_weight = g.op("Where", g.op("IsNaN", attn_weight), const_zero, attn_weight)
else:
    attn_weight = g.op("Softmax", mul_qk_add, axis_i=-1)
  • Update the torch.nn.functional.scaled_dot_product_attention function to pass this option to the exporter:
def scaled_dot_product_attention(q, k, v, attn_mask, ... , disable_nan_sanitization=False):
    # ...
    torch.onnx.export(model, args, f, opset_version, ... , disable_nan_sanitization=disable_nan_sanitization)

Verification

To verify that the fix worked, you can:

  • Export a model with the disable_nan_sanitization option set to True and check that the Where op is not inserted.
  • Measure the performance of the exported model on your target hardware (e.g. RKNPU2) and verify that it is improved.

Extra Tips

  • When using the disable_nan_sanitization option, make sure to test your model thoroughly to ensure that it does not produce NaN values in the Softmax output.
  • Consider adding additional logging or debugging statements to help identify potential issues with NaN values.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix ONNX SDPA export unconditionally inserts IsNaN/Where after Softmax for bool masks, hurting fusion and performance [5 comments, 2 participants]