pytorch - ✅(Solved) Fix viable/strict has been blocked for 5+ days [2 pull requests, 1 comments, 1 participants]

pytorch2026-03-25 17:25:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178396•Fetched 2026-04-08 01:30:36

View on GitHub

Comments

Participants

Timeline

Reactions

Author

georgehong

Participants

georgehong

Assignees

georgehong

Timeline (top)

subscribed ×13labeled ×6mentioned ×3added_to_project_v2 ×1

Error Message

Error looks like

Root Cause

Breakages to TestHOPCUDA.

Fix Action

Mitigation

In progress, continuing to revert incoming changes.

PR fix notes

PR #177922: [inductor] Add inline_asm_elementwise higher-order operator

Repository: pytorch/pytorch
Author: eellison
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/177922

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

-> #177922

Adds a higher-order operator for inline PTX assembly that works in both eager and compiled modes:

Eager: JIT compiles CUDA kernels via Jiterator with inline asm
Compiled: Lowers to tl.inline_asm_elementwise in Triton via Inductor

This enables using PTX instructions not exposed through standard PyTorch ops (e.g., cvt.rn.satfinite.e2m1x2.f32 for NVFP4 quantization) while maintaining bitwise equivalence between eager and compiled execution.

Example usage:

from torch._higher_order_ops import inline_asm_elementwise

result = inline_asm_elementwise(
    x, y,
    asm_str="add.f32 $0, $1, $2;",
    constraints="=f,f,f",
    dtype=torch.float32,
)

Notes:

jitterator has limited support for the following:

multiple outputs
different output dtype than input
pack != 1

In these cases, today, we error in jitterator and succeed in inductor.

We inherit the output striding behavior from jitterator in inductor/compilation (which follows eager pointwise ops).

Inductor details:

Block size is not guaranteed to be a multiple of pack. Particularly, at the end of a persistent reduction, it's possible that xblock == 1. For this case, i've added a triton helper to pad the triton tensor to a multiple of pack, and then split to the actual output after. I suspect this is unlikely to occur but it's better to handle anyway.
Inductor computes bf16/fp16 inside the kernel. For asm targeting these dtypes, we cast prior to invoking the asm. (even with emulate precision casts, we still compute in fp32, just add extra casts).

Otherwise it works more or less as the existing lowering.

reland of https://github.com/pytorch/pytorch/pull/175814

Written with Claude Code.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @Lucaskabela @mlazos

Changed files

test/higher_order_ops/test_inline_asm_elementwise.py (added, +585/-0)
torch/_dynamo/variables/higher_order_ops.py (modified, +26/-0)
torch/_higher_order_ops/__init__.py (modified, +2/-0)
torch/_higher_order_ops/inline_asm_elementwise.py (added, +299/-0)
torch/_inductor/codegen/common.py (modified, +1/-0)
torch/_inductor/codegen/triton.py (modified, +49/-4)
torch/_inductor/dtype_propagation.py (modified, +7/-1)
torch/_inductor/lowering.py (modified, +37/-0)
torch/_inductor/ops_handler.py (modified, +1/-0)
torch/_inductor/runtime/triton_helpers.py (modified, +24/-0)
torch/testing/_internal/hop_db.py (modified, +28/-0)

RAW_BUFFERClick to expand / collapse

Current Status

Under investigation. Core breakage over the weekend has been reverted, and reverting additional test breakages to main.

Error looks like

viable/strict is still on https://github.com/pytorch/pytorch/commit/958d381444ebcad946b965a08545106898420f00.

Incident timeline (all times pacific)

Core breakage began with https://github.com/pytorch/pytorch/pull/177922, which was reverted Mar 24 evening.

User impact

viable/strict has lagged so branch does not have commits more recent than 5 days ago.

Root cause

Breakages to TestHOPCUDA.

Mitigation

In progress, continuing to revert incoming changes.

Prevention/followups

(will update when resolved)

cc @seemethere @malfet @pytorch/pytorch-dev-infra @mruberry

extent analysis

Fix Plan

The fix involves reverting recent changes that broke TestHOPCUDA and updating the viable/strict branch to include recent commits.

Steps to Fix

Revert the changes made in pytorch/pytorch/pull/177922 to prevent core breakage.
Update the viable/strict branch to include commits more recent than 5 days ago.
Example code to revert changes:

git revert <commit-hash>
git push origin <branch-name>

Replace <commit-hash> with the hash of the commit that introduced the breakage and <branch-name> with the name of the branch being updated.

Verification

Verify that the fix worked by checking the following:

The viable/strict branch has been updated with recent commits.
TestHOPCUDA is passing without errors.
The core breakage has been resolved and the code is functioning as expected.

Extra Tips

Regularly review and test changes before merging them into the main branch to prevent similar breakages.
Consider implementing automated testing and continuous integration to catch errors early and prevent them from reaching production.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#task chaining #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix viable/strict has been blocked for 5+ days [2 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error looks like

Root Cause

Fix Action

Mitigation

PR fix notes

PR #177922: [inductor] Add inline_asm_elementwise higher-order operator

Description (problem / solution / changelog)

Changed files

Current Status

Error looks like

Incident timeline (all times pacific)

User impact

Root cause

Mitigation

Prevention/followups

extent analysis

Fix Plan

Steps to Fix

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix viable/strict has been blocked for 5+ days [2 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error looks like

Root Cause

Fix Action

Mitigation

PR fix notes

PR #177922: [inductor] Add inline_asm_elementwise higher-order operator

Description (problem / solution / changelog)

Changed files

Current Status

Error looks like

Incident timeline (all times pacific)

User impact

Root cause

Mitigation

Prevention/followups

extent analysis

Fix Plan

Steps to Fix

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING