pytorch - 💡(How to fix) Fix Dynamo + `triton.heuristics` + `triton.autotune` with `prune_configs_by` -> `AssertionError: Can't construct an AttrSource without a valid base source` [4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178179Fetched 2026-04-08 01:21:18
View on GitHub
Comments
4
Participants
2
Timeline
125
Reactions
0
Timeline (top)
subscribed ×53mentioned ×52labeled ×8commented ×4

Error Message

import sys import traceback

import torch import triton import triton.language as tl

def print_versions(): print(f"Python: {sys.version.split()[0]}") print(f"PyTorch: {torch.version}") print(f"Triton: {triton.version}") if torch.cuda.is_available(): print(f"GPU: {torch.cuda.get_device_name(0)}") print()

def noop_prune(configs, named_args, **kwargs): return configs

Case 1: autotune + prune (no heuristics) → WORKS

@triton.autotune( configs=( triton.Config({"BLOCK": 128}, num_warps=4), triton.Config({"BLOCK": 256}, num_warps=4), ), key=["N"], prune_configs_by={"early_config_prune": noop_prune}, ) @triton.jit def kernel_prune_only(x_ptr, out_ptr, N, BLOCK: tl.constexpr): pid = tl.program_id(0) offs = pid * BLOCK + tl.arange(0, BLOCK) mask = offs < N tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)

Case 2: autotune + heuristics (no prune) → WORKS

@triton.autotune( configs=( triton.Config({"BLOCK": 128}, num_warps=4), triton.Config({"BLOCK": 256}, num_warps=4), ), key=["N"], ) @triton.heuristics({"EVEN": lambda args: args["N"] % 128 == 0}) @triton.jit def kernel_heuristics_only(x_ptr, out_ptr, N, BLOCK: tl.constexpr, EVEN: tl.constexpr): pid = tl.program_id(0) offs = pid * BLOCK + tl.arange(0, BLOCK) mask = offs < N tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)

Case 3: autotune + heuristics + prune → CRASHES

@triton.autotune( configs=( triton.Config({"BLOCK": 128}, num_warps=4), triton.Config({"BLOCK": 256}, num_warps=4), ), key=["N"], prune_configs_by={"early_config_prune": noop_prune}, ) @triton.heuristics({"EVEN": lambda args: args["N"] % 128 == 0}) @triton.jit def kernel_heuristics_and_prune(x_ptr, out_ptr, N, BLOCK: tl.constexpr, EVEN: tl.constexpr): pid = tl.program_id(0) offs = pid * BLOCK + tl.arange(0, BLOCK) mask = offs < N tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)

def run_test(name, kernel, x, expect_fail=False): def fn(x): out = torch.empty_like(x) kernel[(triton.cdiv(x.numel(), 128),)](x, out, x.numel()) return out

compiled = torch.compile(fn, fullgraph=True, backend="eager")
torch._dynamo.reset()
print(f"  {name}: ", end="", flush=True)
try:
    result = compiled(x)
    if torch.allclose(result, x * 2):
        print("UNEXPECTED PASS (bug may be fixed!)" if expect_fail else "PASS")
    else:
        print("WRONG RESULT")
except AssertionError as e:
    if "AttrSource" in str(e) or "valid base source" in str(e):
        tag = "EXPECTED FAIL" if expect_fail else "UNEXPECTED FAIL"
        print(f"{tag}: {e}")
    else:
        print(f"FAIL: AssertionError: {e}")
        traceback.print_exc()
except Exception as e:
    print(f"FAIL: {type(e).__name__}: {e}")
    traceback.print_exc()

def main(): print("=" * 72) print("Dynamo bug: @triton.heuristics + prune_configs_by") print("=" * 72) print() print_versions()

if not torch.cuda.is_available():
    print("ERROR: No GPU available.")
    sys.exit(1)

x = torch.randn(1024, device="cuda")

print("Test 1: autotune + prune_configs_by (no heuristics)")
run_test("prune_only", kernel_prune_only, x)
print()
print("Test 2: autotune + heuristics (no prune_configs_by)")
run_test("heuristics_only", kernel_heuristics_only, x)
print()
print("Test 3: autotune + heuristics + prune_configs_by")
print("  This is the bug.")
run_test("heuristics_and_prune", kernel_heuristics_and_prune, x, expect_fail=True)
print()

if name == "main": main()

Root Cause

When a Triton kernel uses BOTH @triton.heuristics AND @triton.autotune with prune_configs_by, torch.compile throws AssertionError: Can't construct an AttrSource without a valid base source. This is in dynamo and reproducer with the eager backend. I had Claude Code reconstruct a minimal reproducer and attempt to root cause:

Fix Action

Fix / Workaround

Traceback (most recent call last):
  File "dynamo_heuristics_prune_bug.py", line 151, in <module>
    main()
  File "dynamo_heuristics_prune_bug.py", line 146, in main
    run_test("heuristics_and_prune", kernel_heuristics_and_prune, x, expect_fail=True)
  File "dynamo_heuristics_prune_bug.py", line 108, in run_test
    result = compiled(x)
             ^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 953, in compile_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2202, in __call__
    result = self._torchdynamo_orig_backend(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 707, in __call__
    result = _compile(
             ^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1752, in _compile
    guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1433, in compile_inner
    return _compile_inner(code, one_graph, hooks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1467, in _compile_inner
    dynamo_output = compile_frame(
                    ^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
    bytecode, tracer_output = transform_code_object(code, transform)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
    tracer_output = transformations(instructions, code_options)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
    tracer_output = trace_frame(
                    ^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 838, in trace_frame
    run_tracer()
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 819, in run_tracer
    tracer.run()
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1654, in run
    while self.step():
          ^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1334, in step
    self.dispatch_table[inst.opcode](self, inst)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 866, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3807, in CALL
    self._call(inst)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3798, in _call
    self.call_function(fn, args, kwargs)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1240, in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 2792, in call_function
    return dynamo_triton_hopifier_singleton.call_triton_kernel(  # type: ignore[return-value]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_higher_order_ops/triton_kernel_wrap.py", line 1739, in call_triton_kernel
    return self.call_triton_kernel(new_var, args, kwargs, tx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_higher_order_ops/triton_kernel_wrap.py", line 1839, in call_triton_kernel
    wrapped_early_configs_prune = self.wrap_user_defined_obj(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 2652, in wrap_user_defined_obj
    tx, AttrSource(variable.kernel_source, f"{name}")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 5, in __init__
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/source.py", line 288, in __post_init__
    assert self.base, "Can't construct an AttrSource without a valid base source"
           ^^^^^^^^^
AssertionError: Can't construct an AttrSource without a valid base source

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               160
On-line CPU(s) list:                  0-159
Vendor ID:                            GenuineIntel
Model name:                           INTEL(R) XEON(R) PLATINUM 8568Y+
CPU family:                           6
Model:                                207
Thread(s) per core:                   1
Core(s) per socket:                   80
Socket(s):                            2
Stepping:                             2
BogoMIPS:                             4600.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq dtes64 vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk avx512_fp16 arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            5 MiB (160 instances)
L1i cache:                            5 MiB (160 instances)
L2 cache:                             640 MiB (160 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-79
NUMA node1 CPU(s):                    80-159
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Unknown: No mitigations
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Mitigation; TSX disabled

Code Example

import sys
import traceback

import torch
import triton
import triton.language as tl


def print_versions():
    print(f"Python:  {sys.version.split()[0]}")
    print(f"PyTorch: {torch.__version__}")
    print(f"Triton:  {triton.__version__}")
    if torch.cuda.is_available():
        print(f"GPU:     {torch.cuda.get_device_name(0)}")
    print()


def noop_prune(configs, named_args, **kwargs):
    return configs


# Case 1: autotune + prune (no heuristics)WORKS
@triton.autotune(
    configs=(
        triton.Config({"BLOCK": 128}, num_warps=4),
        triton.Config({"BLOCK": 256}, num_warps=4),
    ),
    key=["N"],
    prune_configs_by={"early_config_prune": noop_prune},
)
@triton.jit
def kernel_prune_only(x_ptr, out_ptr, N, BLOCK: tl.constexpr):
    pid = tl.program_id(0)
    offs = pid * BLOCK + tl.arange(0, BLOCK)
    mask = offs < N
    tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)


# Case 2: autotune + heuristics (no prune)WORKS
@triton.autotune(
    configs=(
        triton.Config({"BLOCK": 128}, num_warps=4),
        triton.Config({"BLOCK": 256}, num_warps=4),
    ),
    key=["N"],
)
@triton.heuristics({"EVEN": lambda args: args["N"] % 128 == 0})
@triton.jit
def kernel_heuristics_only(x_ptr, out_ptr, N, BLOCK: tl.constexpr, EVEN: tl.constexpr):
    pid = tl.program_id(0)
    offs = pid * BLOCK + tl.arange(0, BLOCK)
    mask = offs < N
    tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)


# Case 3: autotune + heuristics + prune → CRASHES
@triton.autotune(
    configs=(
        triton.Config({"BLOCK": 128}, num_warps=4),
        triton.Config({"BLOCK": 256}, num_warps=4),
    ),
    key=["N"],
    prune_configs_by={"early_config_prune": noop_prune},
)
@triton.heuristics({"EVEN": lambda args: args["N"] % 128 == 0})
@triton.jit
def kernel_heuristics_and_prune(x_ptr, out_ptr, N, BLOCK: tl.constexpr, EVEN: tl.constexpr):
    pid = tl.program_id(0)
    offs = pid * BLOCK + tl.arange(0, BLOCK)
    mask = offs < N
    tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)


def run_test(name, kernel, x, expect_fail=False):
    def fn(x):
        out = torch.empty_like(x)
        kernel[(triton.cdiv(x.numel(), 128),)](x, out, x.numel())
        return out

    compiled = torch.compile(fn, fullgraph=True, backend="eager")
    torch._dynamo.reset()
    print(f"  {name}: ", end="", flush=True)
    try:
        result = compiled(x)
        if torch.allclose(result, x * 2):
            print("UNEXPECTED PASS (bug may be fixed!)" if expect_fail else "PASS")
        else:
            print("WRONG RESULT")
    except AssertionError as e:
        if "AttrSource" in str(e) or "valid base source" in str(e):
            tag = "EXPECTED FAIL" if expect_fail else "UNEXPECTED FAIL"
            print(f"{tag}: {e}")
        else:
            print(f"FAIL: AssertionError: {e}")
            traceback.print_exc()
    except Exception as e:
        print(f"FAIL: {type(e).__name__}: {e}")
        traceback.print_exc()


def main():
    print("=" * 72)
    print("Dynamo bug: @triton.heuristics + prune_configs_by")
    print("=" * 72)
    print()
    print_versions()

    if not torch.cuda.is_available():
        print("ERROR: No GPU available.")
        sys.exit(1)

    x = torch.randn(1024, device="cuda")

    print("Test 1: autotune + prune_configs_by (no heuristics)")
    run_test("prune_only", kernel_prune_only, x)
    print()
    print("Test 2: autotune + heuristics (no prune_configs_by)")
    run_test("heuristics_only", kernel_heuristics_only, x)
    print()
    print("Test 3: autotune + heuristics + prune_configs_by")
    print("  This is the bug.")
    run_test("heuristics_and_prune", kernel_heuristics_and_prune, x, expect_fail=True)
    print()


if __name__ == "__main__":
    main()

---

Traceback (most recent call last):
  File "dynamo_heuristics_prune_bug.py", line 151, in <module>
    main()
  File "dynamo_heuristics_prune_bug.py", line 146, in main
    run_test("heuristics_and_prune", kernel_heuristics_and_prune, x, expect_fail=True)
  File "dynamo_heuristics_prune_bug.py", line 108, in run_test
    result = compiled(x)
             ^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 953, in compile_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2202, in __call__
    result = self._torchdynamo_orig_backend(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 707, in __call__
    result = _compile(
             ^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1752, in _compile
    guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1433, in compile_inner
    return _compile_inner(code, one_graph, hooks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1467, in _compile_inner
    dynamo_output = compile_frame(
                    ^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
    bytecode, tracer_output = transform_code_object(code, transform)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
    tracer_output = transformations(instructions, code_options)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
    tracer_output = trace_frame(
                    ^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 838, in trace_frame
    run_tracer()
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 819, in run_tracer
    tracer.run()
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1654, in run
    while self.step():
          ^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1334, in step
    self.dispatch_table[inst.opcode](self, inst)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 866, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3807, in CALL
    self._call(inst)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3798, in _call
    self.call_function(fn, args, kwargs)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1240, in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 2792, in call_function
    return dynamo_triton_hopifier_singleton.call_triton_kernel(  # type: ignore[return-value]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_higher_order_ops/triton_kernel_wrap.py", line 1739, in call_triton_kernel
    return self.call_triton_kernel(new_var, args, kwargs, tx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_higher_order_ops/triton_kernel_wrap.py", line 1839, in call_triton_kernel
    wrapped_early_configs_prune = self.wrap_user_defined_obj(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 2652, in wrap_user_defined_obj
    tx, AttrSource(variable.kernel_source, f"{name}")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 5, in __init__
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/source.py", line 288, in __post_init__
    assert self.base, "Can't construct an AttrSource without a valid base source"
           ^^^^^^^^^
AssertionError: Can't construct an AttrSource without a valid base source

from user code:
   File "dynamo_heuristics_prune_bug.py", line 101, in fn
    kernel[(triton.cdiv(x.numel(), 128),)](x, out, x.numel())

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

---

Collecting environment information...
PyTorch version: 2.10.0+rocm7.1
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 7.1.25424

OS: Ubuntu 24.04.3 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Jan  8 2026, 11:30:50) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to:
GPU models and configuration: AMD Instinct MI300X VF (gfx942:sramecc+:xnack-)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: 7.1.25424
MIOpen runtime version: 3.5.1
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               160
On-line CPU(s) list:                  0-159
Vendor ID:                            GenuineIntel
Model name:                           INTEL(R) XEON(R) PLATINUM 8568Y+
CPU family:                           6
Model:                                207
Thread(s) per core:                   1
Core(s) per socket:                   80
Socket(s):                            2
Stepping:                             2
BogoMIPS:                             4600.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq dtes64 vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk avx512_fp16 arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            5 MiB (160 instances)
L1i cache:                            5 MiB (160 instances)
L2 cache:                             640 MiB (160 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-79
NUMA node1 CPU(s):                    80-159
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Unknown: No mitigations
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Mitigation; TSX disabled

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pytorch-lightning==2.5.0
[pip3] torch==2.10.0+rocm7.1
[pip3] torchmetrics==1.8.2
[pip3] torchvision==0.25.0+rocm7.1
[pip3] triton==3.6.0
[conda] Could not collect
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

When a Triton kernel uses BOTH @triton.heuristics AND @triton.autotune with prune_configs_by, torch.compile throws AssertionError: Can't construct an AttrSource without a valid base source. This is in dynamo and reproducer with the eager backend. I had Claude Code reconstruct a minimal reproducer and attempt to root cause:

Root cause: Dynamo's wrap_user_defined_obj (torch/_dynamo/variables/functions.py) uses variable.kernel_source to construct an AttrSource. When the kernel is wrapped by @triton.heuristics, kernel_source is None because Dynamo loses source tracking through the heuristics wrapper. Without heuristics, kernel_source is properly set to GlobalSource(...).

This is distinct from #177600 as it doesn't require repeated calls.

<details><summary>reproducer</summary>
import sys
import traceback

import torch
import triton
import triton.language as tl


def print_versions():
    print(f"Python:  {sys.version.split()[0]}")
    print(f"PyTorch: {torch.__version__}")
    print(f"Triton:  {triton.__version__}")
    if torch.cuda.is_available():
        print(f"GPU:     {torch.cuda.get_device_name(0)}")
    print()


def noop_prune(configs, named_args, **kwargs):
    return configs


# Case 1: autotune + prune (no heuristics) → WORKS
@triton.autotune(
    configs=(
        triton.Config({"BLOCK": 128}, num_warps=4),
        triton.Config({"BLOCK": 256}, num_warps=4),
    ),
    key=["N"],
    prune_configs_by={"early_config_prune": noop_prune},
)
@triton.jit
def kernel_prune_only(x_ptr, out_ptr, N, BLOCK: tl.constexpr):
    pid = tl.program_id(0)
    offs = pid * BLOCK + tl.arange(0, BLOCK)
    mask = offs < N
    tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)


# Case 2: autotune + heuristics (no prune) → WORKS
@triton.autotune(
    configs=(
        triton.Config({"BLOCK": 128}, num_warps=4),
        triton.Config({"BLOCK": 256}, num_warps=4),
    ),
    key=["N"],
)
@triton.heuristics({"EVEN": lambda args: args["N"] % 128 == 0})
@triton.jit
def kernel_heuristics_only(x_ptr, out_ptr, N, BLOCK: tl.constexpr, EVEN: tl.constexpr):
    pid = tl.program_id(0)
    offs = pid * BLOCK + tl.arange(0, BLOCK)
    mask = offs < N
    tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)


# Case 3: autotune + heuristics + prune → CRASHES
@triton.autotune(
    configs=(
        triton.Config({"BLOCK": 128}, num_warps=4),
        triton.Config({"BLOCK": 256}, num_warps=4),
    ),
    key=["N"],
    prune_configs_by={"early_config_prune": noop_prune},
)
@triton.heuristics({"EVEN": lambda args: args["N"] % 128 == 0})
@triton.jit
def kernel_heuristics_and_prune(x_ptr, out_ptr, N, BLOCK: tl.constexpr, EVEN: tl.constexpr):
    pid = tl.program_id(0)
    offs = pid * BLOCK + tl.arange(0, BLOCK)
    mask = offs < N
    tl.store(out_ptr + offs, tl.load(x_ptr + offs, mask=mask) * 2, mask=mask)


def run_test(name, kernel, x, expect_fail=False):
    def fn(x):
        out = torch.empty_like(x)
        kernel[(triton.cdiv(x.numel(), 128),)](x, out, x.numel())
        return out

    compiled = torch.compile(fn, fullgraph=True, backend="eager")
    torch._dynamo.reset()
    print(f"  {name}: ", end="", flush=True)
    try:
        result = compiled(x)
        if torch.allclose(result, x * 2):
            print("UNEXPECTED PASS (bug may be fixed!)" if expect_fail else "PASS")
        else:
            print("WRONG RESULT")
    except AssertionError as e:
        if "AttrSource" in str(e) or "valid base source" in str(e):
            tag = "EXPECTED FAIL" if expect_fail else "UNEXPECTED FAIL"
            print(f"{tag}: {e}")
        else:
            print(f"FAIL: AssertionError: {e}")
            traceback.print_exc()
    except Exception as e:
        print(f"FAIL: {type(e).__name__}: {e}")
        traceback.print_exc()


def main():
    print("=" * 72)
    print("Dynamo bug: @triton.heuristics + prune_configs_by")
    print("=" * 72)
    print()
    print_versions()

    if not torch.cuda.is_available():
        print("ERROR: No GPU available.")
        sys.exit(1)

    x = torch.randn(1024, device="cuda")

    print("Test 1: autotune + prune_configs_by (no heuristics)")
    run_test("prune_only", kernel_prune_only, x)
    print()
    print("Test 2: autotune + heuristics (no prune_configs_by)")
    run_test("heuristics_only", kernel_heuristics_only, x)
    print()
    print("Test 3: autotune + heuristics + prune_configs_by")
    print("  This is the bug.")
    run_test("heuristics_and_prune", kernel_heuristics_and_prune, x, expect_fail=True)
    print()


if __name__ == "__main__":
    main()
</details>

Error logs

<details><summary>Error logs</summary>
Traceback (most recent call last):
  File "dynamo_heuristics_prune_bug.py", line 151, in <module>
    main()
  File "dynamo_heuristics_prune_bug.py", line 146, in main
    run_test("heuristics_and_prune", kernel_heuristics_and_prune, x, expect_fail=True)
  File "dynamo_heuristics_prune_bug.py", line 108, in run_test
    result = compiled(x)
             ^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 953, in compile_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2202, in __call__
    result = self._torchdynamo_orig_backend(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 707, in __call__
    result = _compile(
             ^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1752, in _compile
    guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1433, in compile_inner
    return _compile_inner(code, one_graph, hooks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1467, in _compile_inner
    dynamo_output = compile_frame(
                    ^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
    bytecode, tracer_output = transform_code_object(code, transform)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
    tracer_output = transformations(instructions, code_options)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
    tracer_output = trace_frame(
                    ^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 838, in trace_frame
    run_tracer()
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 819, in run_tracer
    tracer.run()
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1654, in run
    while self.step():
          ^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1334, in step
    self.dispatch_table[inst.opcode](self, inst)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 866, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3807, in CALL
    self._call(inst)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3798, in _call
    self.call_function(fn, args, kwargs)
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1240, in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 2792, in call_function
    return dynamo_triton_hopifier_singleton.call_triton_kernel(  # type: ignore[return-value]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_higher_order_ops/triton_kernel_wrap.py", line 1739, in call_triton_kernel
    return self.call_triton_kernel(new_var, args, kwargs, tx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_higher_order_ops/triton_kernel_wrap.py", line 1839, in call_triton_kernel
    wrapped_early_configs_prune = self.wrap_user_defined_obj(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 2652, in wrap_user_defined_obj
    tx, AttrSource(variable.kernel_source, f"{name}")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 5, in __init__
  File ".venv/lib/python3.12/site-packages/torch/_dynamo/source.py", line 288, in __post_init__
    assert self.base, "Can't construct an AttrSource without a valid base source"
           ^^^^^^^^^
AssertionError: Can't construct an AttrSource without a valid base source

from user code:
   File "dynamo_heuristics_prune_bug.py", line 101, in fn
    kernel[(triton.cdiv(x.numel(), 128),)](x, out, x.numel())

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
</details>

Versions

<details><summary>env</summary>
Collecting environment information...
PyTorch version: 2.10.0+rocm7.1
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 7.1.25424

OS: Ubuntu 24.04.3 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Jan  8 2026, 11:30:50) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to:
GPU models and configuration: AMD Instinct MI300X VF (gfx942:sramecc+:xnack-)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: 7.1.25424
MIOpen runtime version: 3.5.1
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               160
On-line CPU(s) list:                  0-159
Vendor ID:                            GenuineIntel
Model name:                           INTEL(R) XEON(R) PLATINUM 8568Y+
CPU family:                           6
Model:                                207
Thread(s) per core:                   1
Core(s) per socket:                   80
Socket(s):                            2
Stepping:                             2
BogoMIPS:                             4600.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq dtes64 vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk avx512_fp16 arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            5 MiB (160 instances)
L1i cache:                            5 MiB (160 instances)
L2 cache:                             640 MiB (160 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-79
NUMA node1 CPU(s):                    80-159
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Unknown: No mitigations
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Mitigation; TSX disabled

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pytorch-lightning==2.5.0
[pip3] torch==2.10.0+rocm7.1
[pip3] torchmetrics==1.8.2
[pip3] torchvision==0.25.0+rocm7.1
[pip3] triton==3.6.0
[conda] Could not collect
</details>

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo @oulgen @aakhundov @davidberard98

extent analysis

Fix Plan

To fix the issue, we need to modify the wrap_user_defined_obj function in torch/_dynamo/variables/functions.py to handle the case where variable.kernel_source is None.

Here are the steps:

  • Modify the wrap_user_defined_obj function to check if variable.kernel_source is None before trying to construct an AttrSource.
  • If variable.kernel_source is None, use a default source or raise a custom error.

Example code:

def wrap_user_defined_obj(self, variable, name, tx):
    if variable.kernel_source is None:
        # Either use a default source or raise a custom error
        # For example, use a default source:
        source = GlobalSource()
        return tx, AttrSource(source, f"{name}")
    else:
        return tx, AttrSource(variable.kernel_source, f"{name}")

Alternatively, you can also modify the @triton.heuristics decorator to preserve the kernel_source attribute when wrapping the kernel function.

Verification

To verify that the fix worked, run the test cases again and check that the AssertionError is no longer raised. You can also add additional test cases to ensure that the fix does not introduce any regressions.

Extra Tips

  • Make sure to test the fix thoroughly to ensure that it does not introduce any regressions.
  • Consider adding a custom error message to provide more context when variable.kernel_source is None.
  • If you are using a custom kernel_source attribute, make sure to update the fix accordingly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING