pytorch - 💡(How to fix) Fix [inductor] define_user_defined_triton_kernel fails to escape backslashes → SyntaxError in generated code

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

"""Repro for a SyntaxError in torch._inductor.codegen.wrapper.define_user_defined_triton_kernel.

THE BUG

define_user_defined_triton_kernel() embeds a user-defined Triton kernel's source text inside a triple-quoted string in the generated AOT-inductor module. Before embedding it only escapes single-quotes but NOT backslashes:

kernel_src = kernel_src.replace("'''", "\\'\\'\\'")  # existing
compile_wrapper.splice(kernel_src)                    # embeds raw text

If the kernel's source text contains a backslash escape sequence (for example a tl.constexpr default value of 'mov.u32 ...;\n add...'), the backslash survives into the triple-quoted string in the generated AOT module. When Python parses that AOT module the triple-quoted string treats '\n' as a real newline.

Then async_compile.triton() writes the now-multi-line src_code verbatim to a second .py file. The newline lands INSIDE what was supposed to be a single-quoted string literal in that file:

PTX: tl.constexpr = 'mov.u32 %r0, 1;
                      add.u32 %r1, %r0, 1;'      # <-- SyntaxError

Python raises: SyntaxError: unterminated string literal (or EOL while scanning string literal)

…depending on version. The error originates in the SECOND generated file (the per-kernel async-compile output), not the first AOT wrapper module.

THE FIX

Add one line BEFORE the existing replace, so every backslash in the embedded source is doubled:

kernel_src = kernel_src.replace("\\\\", "\\\\\\\\")   # NEW
kernel_src = kernel_src.replace("'''", "\\'\\'\\'")   # existing
compile_wrapper.splice(kernel_src)

With backslashes doubled, the triple-quoted string in the AOT module emits '\n' (two chars) instead of a real newline; the second .py file then contains a single-line string literal exactly as written in the original kernel source. No SyntaxError.

HOW TO REPRODUCE

Prereqs:

  • A torch install that includes torch._inductor (>= 2.4) — built from source if you want to test a local fix.
  • Triton (pulled in transitively by torch).
  • A CUDA device of any architecture. The SyntaxError surfaces during inductor codegen, BEFORE kernel launch, but torch.compile + triton won't reach codegen without CUDA available.

Run:

python repro_patch_o.py

Unpatched torch produces a traceback ending roughly like:

File "/tmp/torchinductor_<user>/.../async_compile_<hash>.py", line NN
    PTX: tl.constexpr = 'mov.u32 %r0, 1;
                        ^
SyntaxError: unterminated string literal (detected at line NN)

The line number and file name vary by torch version but the SyntaxError on a string literal containing an unescaped newline is the consistent signal.

After applying the backslash-escape fix to torch/_inductor/codegen/wrapper.py (see THE FIX above), the same command runs cleanly and prints:

OK: kernel JIT'd cleanly, y[:4] = [1.0, 1.0, 1.0, 1.0]

""" import torch import triton import triton.language as tl

The trigger: a triton.jit kernel whose source contains a backslash

escape sequence inside a tl.constexpr default value. Inductor reads

this source verbatim (via inspect.getsource) and embeds it in the

generated AOT wrapper module. Without the backslash-escape fix, the

backslash isn't doubled, so '\n' becomes a real newline by the time

the second-stage .py file is parsed.

@triton.jit def k(x_ptr, out_ptr, n: tl.constexpr, PTX: tl.constexpr = 'mov.u32 %r0, 1;\n add.u32 %r1, %r0, 1;'): pid = tl.program_id(0) offs = pid * n + tl.arange(0, n) x = tl.load(x_ptr + offs, mask=offs < n) # PTX is intentionally unused at runtime — its only job is to put a # backslash escape into the kernel's source text so inductor's # codegen has something to mishandle. tl.store(out_ptr + offs, x + 1.0, mask=offs < n)

@torch.compile(dynamic=False, mode="reduce-overhead") def fn(x): out = torch.empty_like(x) # Calling a user-defined triton.jit kernel inside a torch.compile # function routes through inductor's define_user_defined_triton_kernel # path — that's where the bug lives. k[(1,)](x, out, n=128) return out

if name == "main": if not torch.cuda.is_available(): raise SystemExit("This repro needs a CUDA device (any arch). " "torch.compile + triton.jit will not codegen " "without one.") x = torch.zeros(128, device="cuda", dtype=torch.float32) y = fn(x) # If we get here, the kernel compiled and ran -> the fix is applied # (or the bug has been fixed upstream by the time you're reading this). print("OK: kernel JIT'd cleanly, y[:4] =", y[:4].tolist())

Root Cause

Root cause: missing backslash-escape pass before the existing quote-escape pass in wrapper.py. Doubling backslashes first makes the triple-quoted literal emit \n (two chars) instead of a real newline, so the second-stage file then contains a well-formed single-line string literal.

Fix Action

Fix / Workaround

python repro_patch_o.py

Unpatched torch produces a traceback ending roughly like:

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i9-14900K CPU family: 6 Model: 183 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 Stepping: 1 CPU(s) scaling MHz: 92% CPU max MHz: 6000.0000 CPU min MHz: 800.0000 BogoMIPS: 6374.40 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities ibpb_exit_to_user Virtualization: VT-x L1d cache: 896 KiB (24 instances) L1i cache: 1.3 MiB (24 instances) L2 cache: 32 MiB (12 instances) L3 cache: 36 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-31 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Mitigation; Clear Register File Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Mitigation; IBPB before exit to userspace

Code Example

kernel_src = kernel_src.replace("'''", "\\'\\'\\'")  # only quotes escaped
  compile_wrapper.splice(kernel_src)                    # raw source spliced in

---

PTX: tl.constexpr = 'mov.u32 %r0, 1;
                       add.u32 %r1, %r0, 1;'

---

"""Repro for a SyntaxError in
torch._inductor.codegen.wrapper.define_user_defined_triton_kernel.

THE BUG
=======
define_user_defined_triton_kernel() embeds a user-defined Triton kernel's
source text inside a triple-quoted string in the generated AOT-inductor
module. Before embedding it only escapes single-quotes but NOT
backslashes:

    kernel_src = kernel_src.replace("'''", "\\'\\'\\'")  # existing
    compile_wrapper.splice(kernel_src)                    # embeds raw text

If the kernel's source text contains a backslash escape sequence (for
example a tl.constexpr default value of 'mov.u32 ...;\n add...'), the
backslash survives into the triple-quoted string in the generated AOT
module. When Python parses that AOT module the triple-quoted string
treats '\n' as a real newline.

Then async_compile.triton() writes the now-multi-line src_code verbatim
to a second .py file. The newline lands INSIDE what was supposed to be
a single-quoted string literal in that file:

    PTX: tl.constexpr = 'mov.u32 %r0, 1;
                          add.u32 %r1, %r0, 1;'      # <-- SyntaxError

Python raises:
    SyntaxError: unterminated string literal (or EOL while scanning
    string literal)

…depending on version. The error originates in the SECOND generated
file (the per-kernel async-compile output), not the first AOT wrapper
module.

THE FIX
=======
Add one line BEFORE the existing replace, so every backslash in the
embedded source is doubled:

    kernel_src = kernel_src.replace("\\\\", "\\\\\\\\")   # NEW
    kernel_src = kernel_src.replace("'''", "\\'\\'\\'")   # existing
    compile_wrapper.splice(kernel_src)

With backslashes doubled, the triple-quoted string in the AOT module
emits '\\n' (two chars) instead of a real newline; the second .py file
then contains a single-line string literal exactly as written in the
original kernel source. No SyntaxError.

HOW TO REPRODUCE
================
Prereqs:
  - A torch install that includes torch._inductor (>= 2.4) — built from
    source if you want to test a local fix.
  - Triton (pulled in transitively by torch).
  - A CUDA device of any architecture. The SyntaxError surfaces during
    inductor codegen, BEFORE kernel launch, but torch.compile + triton
    won't reach codegen without CUDA available.

Run:

    python repro_patch_o.py

Unpatched torch produces a traceback ending roughly like:

    File "/tmp/torchinductor_<user>/.../async_compile_<hash>.py", line NN
        PTX: tl.constexpr = 'mov.u32 %r0, 1;
                            ^
    SyntaxError: unterminated string literal (detected at line NN)

The line number and file name vary by torch version but the SyntaxError
on a string literal containing an unescaped newline is the consistent
signal.

After applying the backslash-escape fix to
torch/_inductor/codegen/wrapper.py (see THE FIX above), the same
command runs cleanly and prints:

    OK: kernel JIT'd cleanly, y[:4] = [1.0, 1.0, 1.0, 1.0]
"""
import torch
import triton
import triton.language as tl


# The trigger: a triton.jit kernel whose source contains a backslash
# escape sequence inside a tl.constexpr default value. Inductor reads
# this source verbatim (via inspect.getsource) and embeds it in the
# generated AOT wrapper module. Without the backslash-escape fix, the
# backslash isn't doubled, so '\n' becomes a real newline by the time
# the second-stage .py file is parsed.
@triton.jit
def k(x_ptr, out_ptr, n: tl.constexpr,
      PTX: tl.constexpr = 'mov.u32 %r0, 1;\n add.u32 %r1, %r0, 1;'):
    pid = tl.program_id(0)
    offs = pid * n + tl.arange(0, n)
    x = tl.load(x_ptr + offs, mask=offs < n)
    # PTX is intentionally unused at runtime — its only job is to put a
    # backslash escape into the kernel's source text so inductor's
    # codegen has something to mishandle.
    tl.store(out_ptr + offs, x + 1.0, mask=offs < n)


@torch.compile(dynamic=False, mode="reduce-overhead")
def fn(x):
    out = torch.empty_like(x)
    # Calling a user-defined triton.jit kernel inside a torch.compile
    # function routes through inductor's define_user_defined_triton_kernel
    # path — that's where the bug lives.
    k[(1,)](x, out, n=128)
    return out


if __name__ == "__main__":
    if not torch.cuda.is_available():
        raise SystemExit("This repro needs a CUDA device (any arch). "
                         "torch.compile + triton.jit will not codegen "
                         "without one.")
    x = torch.zeros(128, device="cuda", dtype=torch.float32)
    y = fn(x)
    # If we get here, the kernel compiled and ran -> the fix is applied
    # (or the bug has been fixed upstream by the time you're reading this).
    print("OK: kernel JIT'd cleanly, y[:4] =", y[:4].tolist())
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch._inductor.codegen.wrapper.define_user_defined_triton_kernel embeds a user-defined triton.jit kernel's source text (obtained via inspect.getsource) inside a triple-quoted Python string in the generated AOT wrapper module. Before embedding, it escapes single triple-quotes (''') but does not escape backslashes:

  kernel_src = kernel_src.replace("'''", "\\'\\'\\'")  # only quotes escaped
  compile_wrapper.splice(kernel_src)                    # raw source spliced in

If the kernel source contains any backslash escape sequence — e.g. a tl.constexpr default like 'foo\n bar' — the backslash survives into the triple-quoted literal in the AOT module. When Python parses that AOT module, the triple-quoted string converts \n into a real newline.

That now-multi-line source is then handed to async_compile.triton(...) (in torch/inductor/async_compile.py), which writes it verbatim into a second generated .py file under /tmp/torchinductor<user>/.../. The newline lands inside what should be a single-line string literal in that second file:

  PTX: tl.constexpr = 'mov.u32 %r0, 1;
                       add.u32 %r1, %r0, 1;'

Python rejects the second file with:

SyntaxError: unterminated string literal (detected at line N)

Surfaced as torch._inductor.exc.InductorError from _reload_python_module. The error fires during inductor codegen, before any kernel launch, so the kernel never executes.

Impact: any user-defined triton.jit kernel whose source text contains a backslash escape (most commonly inline PTX strings, regex patterns in comments, or \n/\t in constexpr string defaults) cannot be compiled under torch.compile.

Root cause: missing backslash-escape pass before the existing quote-escape pass in wrapper.py. Doubling backslashes first makes the triple-quoted literal emit \n (two chars) instead of a real newline, so the second-stage file then contains a well-formed single-line string literal.

pytorch/repro_inductor_user_triton_kernel_backslash_syntaxerror.py

"""Repro for a SyntaxError in
torch._inductor.codegen.wrapper.define_user_defined_triton_kernel.

THE BUG
=======
define_user_defined_triton_kernel() embeds a user-defined Triton kernel's
source text inside a triple-quoted string in the generated AOT-inductor
module. Before embedding it only escapes single-quotes but NOT
backslashes:

    kernel_src = kernel_src.replace("'''", "\\'\\'\\'")  # existing
    compile_wrapper.splice(kernel_src)                    # embeds raw text

If the kernel's source text contains a backslash escape sequence (for
example a tl.constexpr default value of 'mov.u32 ...;\n add...'), the
backslash survives into the triple-quoted string in the generated AOT
module. When Python parses that AOT module the triple-quoted string
treats '\n' as a real newline.

Then async_compile.triton() writes the now-multi-line src_code verbatim
to a second .py file. The newline lands INSIDE what was supposed to be
a single-quoted string literal in that file:

    PTX: tl.constexpr = 'mov.u32 %r0, 1;
                          add.u32 %r1, %r0, 1;'      # <-- SyntaxError

Python raises:
    SyntaxError: unterminated string literal (or EOL while scanning
    string literal)

…depending on version. The error originates in the SECOND generated
file (the per-kernel async-compile output), not the first AOT wrapper
module.

THE FIX
=======
Add one line BEFORE the existing replace, so every backslash in the
embedded source is doubled:

    kernel_src = kernel_src.replace("\\\\", "\\\\\\\\")   # NEW
    kernel_src = kernel_src.replace("'''", "\\'\\'\\'")   # existing
    compile_wrapper.splice(kernel_src)

With backslashes doubled, the triple-quoted string in the AOT module
emits '\\n' (two chars) instead of a real newline; the second .py file
then contains a single-line string literal exactly as written in the
original kernel source. No SyntaxError.

HOW TO REPRODUCE
================
Prereqs:
  - A torch install that includes torch._inductor (>= 2.4) — built from
    source if you want to test a local fix.
  - Triton (pulled in transitively by torch).
  - A CUDA device of any architecture. The SyntaxError surfaces during
    inductor codegen, BEFORE kernel launch, but torch.compile + triton
    won't reach codegen without CUDA available.

Run:

    python repro_patch_o.py

Unpatched torch produces a traceback ending roughly like:

    File "/tmp/torchinductor_<user>/.../async_compile_<hash>.py", line NN
        PTX: tl.constexpr = 'mov.u32 %r0, 1;
                            ^
    SyntaxError: unterminated string literal (detected at line NN)

The line number and file name vary by torch version but the SyntaxError
on a string literal containing an unescaped newline is the consistent
signal.

After applying the backslash-escape fix to
torch/_inductor/codegen/wrapper.py (see THE FIX above), the same
command runs cleanly and prints:

    OK: kernel JIT'd cleanly, y[:4] = [1.0, 1.0, 1.0, 1.0]
"""
import torch
import triton
import triton.language as tl


# The trigger: a triton.jit kernel whose source contains a backslash
# escape sequence inside a tl.constexpr default value. Inductor reads
# this source verbatim (via inspect.getsource) and embeds it in the
# generated AOT wrapper module. Without the backslash-escape fix, the
# backslash isn't doubled, so '\n' becomes a real newline by the time
# the second-stage .py file is parsed.
@triton.jit
def k(x_ptr, out_ptr, n: tl.constexpr,
      PTX: tl.constexpr = 'mov.u32 %r0, 1;\n add.u32 %r1, %r0, 1;'):
    pid = tl.program_id(0)
    offs = pid * n + tl.arange(0, n)
    x = tl.load(x_ptr + offs, mask=offs < n)
    # PTX is intentionally unused at runtime — its only job is to put a
    # backslash escape into the kernel's source text so inductor's
    # codegen has something to mishandle.
    tl.store(out_ptr + offs, x + 1.0, mask=offs < n)


@torch.compile(dynamic=False, mode="reduce-overhead")
def fn(x):
    out = torch.empty_like(x)
    # Calling a user-defined triton.jit kernel inside a torch.compile
    # function routes through inductor's define_user_defined_triton_kernel
    # path — that's where the bug lives.
    k[(1,)](x, out, n=128)
    return out


if __name__ == "__main__":
    if not torch.cuda.is_available():
        raise SystemExit("This repro needs a CUDA device (any arch). "
                         "torch.compile + triton.jit will not codegen "
                         "without one.")
    x = torch.zeros(128, device="cuda", dtype=torch.float32)
    y = fn(x)
    # If we get here, the kernel compiled and ran -> the fix is applied
    # (or the bug has been fixed upstream by the time you're reading this).
    print("OK: kernel JIT'd cleanly, y[:4] =", y[:4].tolist())

Versions

curl -sL https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py | python ⎿  Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A

 OS: Ubuntu 24.04.4 LTS (x86_64)
 GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
 Clang version: 18.1.3 (1ubuntu1)
 CMake version: version 3.31.4
 Libc version: glibc-2.39

 Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime)
 Python platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.39
 Is CUDA available: True
 CUDA runtime version: 12.6.85
 CUDA_MODULE_LOADING set to: LAZY
 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
 Nvidia driver version: 580.126.09
 cuDNN version: Probably one of the following:
 /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
 /usr/lib/x86_64-linux-gnu/libcudnn.so.9.6.0
 /usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.6.0
 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
 /usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.6.0
 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
 /usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.6.0
 /usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.6.0
 /usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.6.0
 /usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.6.0
 /usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.6.0
 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
 Is XPU available: False
 HIP runtime version: N/A
 MIOpen runtime version: N/A
 Is XNNPACK available: True
 Caching allocator config: N/A

 CPU:
 Architecture:                            x86_64
 CPU op-mode(s):                          32-bit, 64-bit
 Address sizes:                           46 bits physical, 48 bits virtual
 Byte Order:                              Little Endian
 CPU(s):                                  32
 On-line CPU(s) list:                     0-31
 Vendor ID:                               GenuineIntel
 Model name:                              Intel(R) Core(TM) i9-14900K
 CPU family:                              6
 Model:                                   183
 Thread(s) per core:                      2
 Core(s) per socket:                      24
 Socket(s):                               1
 Stepping:                                1
 CPU(s) scaling MHz:                      92%
 CPU max MHz:                             6000.0000
 CPU min MHz:                             800.0000
 BogoMIPS:                                6374.40
 Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
  rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx
 est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
 rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept
 vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni
 xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify
 hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b
  fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities ibpb_exit_to_user
 Virtualization:                          VT-x
 L1d cache:                               896 KiB (24 instances)
 L1i cache:                               1.3 MiB (24 instances)
 L2 cache:                                32 MiB (12 instances)
 L3 cache:                                36 MiB (1 instance)
 NUMA node(s):                            1
 NUMA node0 CPU(s):                       0-31
 Vulnerability Gather data sampling:      Not affected
 Vulnerability Indirect target selection: Not affected
 Vulnerability Itlb multihit:             Not affected
 Vulnerability L1tf:                      Not affected
 Vulnerability Mds:                       Not affected
 Vulnerability Meltdown:                  Not affected
 Vulnerability Mmio stale data:           Not affected
 Vulnerability Reg file data sampling:    Mitigation; Clear Register File
 Vulnerability Retbleed:                  Not affected
 Vulnerability Spec rstack overflow:      Not affected
 Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
 Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
 Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW
 sequence; BHI BHI_DIS_S
 Vulnerability Srbds:                     Not affected
 Vulnerability Tsa:                       Not affected
 Vulnerability Tsx async abort:           Not affected
 Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

 Versions of relevant libraries:
 [pip3] flashinfer-python==0.2.3+cu124torch2.5
 [pip3] numpy==1.26.4
 [pip3] nvidia-cublas-cu11==11.11.3.6
 [pip3] nvidia-cublas-cu12==12.4.5.8
 [pip3] nvidia-cuda-cupti-cu11==11.8.87
 [pip3] nvidia-cuda-cupti-cu12==12.4.127
 [pip3] nvidia-cuda-nvrtc-cu11==11.8.89
 [pip3] nvidia-cuda-nvrtc-cu12==12.4.127
 [pip3] nvidia-cuda-runtime-cu11==11.8.89
 [pip3] nvidia-cuda-runtime-cu12==12.4.127
 [pip3] nvidia-cudnn-cu11==9.1.0.70
 [pip3] nvidia-cudnn-cu12==9.1.0.70
 [pip3] nvidia-cufft-cu11==10.9.0.58
 [pip3] nvidia-cufft-cu12==11.2.1.3
 [pip3] nvidia-curand-cu11==10.3.0.86
 [pip3] nvidia-curand-cu12==10.3.5.147
 [pip3] nvidia-cusolver-cu11==11.4.1.48
 [pip3] nvidia-cusolver-cu12==11.6.1.9
 [pip3] nvidia-cusparse-cu11==11.7.5.86
 [pip3] nvidia-cusparse-cu12==12.3.1.170
 [pip3] nvidia-nccl-cu11==2.21.5
 [pip3] nvidia-nccl-cu12==2.21.5
 [pip3] nvidia-nvjitlink-cu12==12.4.127
 [pip3] nvidia-nvtx-cu11==11.8.86
 [pip3] nvidia-nvtx-cu12==12.4.127
 [pip3] torch==2.5.1
 [pip3] torchao==0.8.0+git2f97b095
 [pip3] torchaudio==2.5.1+cu118
 [pip3] torchtune==0.5.0.dev20241218+cpu
 [pip3] torchvision==0.20.1
 [pip3] triton==3.1.0
 [pip3] types-flake8-2020==1.8
 [pip3] types-flake8-bugbear==23.9.16
 [pip3] types-flake8-builtins==2.2
 [pip3] types-flake8-docstrings==1.7
 [pip3] types-flake8-plugin-utils==1.3
 [pip3] types-flake8-rst-docstrings==0.3
 [pip3] types-flake8-simplify==0.21
 [pip3] types-flake8-typing-imports==1.15
 [pip3] types-mypy-extensions==1.0
 [conda] Could not collect

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [inductor] define_user_defined_triton_kernel fails to escape backslashes → SyntaxError in generated code