pytorch - 💡(How to fix) Fix [torch.compile] InductorError: both a fallback and a decomp for `aten.index_add.default` with bfloat16 [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#179418Fetched 2026-04-08 02:51:51
View on GitHub
Comments
2
Participants
2
Timeline
126
Reactions
0
Author
Participants
Timeline (top)
mentioned ×54subscribed ×54labeled ×7referenced ×4

Error Message

torch._inductor.exc.InductorError: AssertionError: both a fallback and a decomp for same op: aten.index_add.default

Fix Action

Fix / Workaround

Traceback (most recent call last):
  File "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", line 39, in <module>
    model(x)  # crashes here
    │     └ tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
    └ Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
    (proj): Linear(in_fea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           │                │       └ {}
           │                └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
    (proj): Linear(in_fea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           │             │       └ {}
           │             └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ <bound method Sequential.forward of Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, ...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/container.py", line 253, in forward
    input = module(input)
    │       │      └ tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
    │       └ Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linear(in_features=256, out_features...
    └ tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
           │                         │       └ {}
           │                         └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linear(in_features=256, out_features...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1050, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1036, in compile_wrapper
    return fn(*args, **kwargs)
           │   │       └ {}
           │   └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ <bound method Module._call_impl of Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): L...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           │             │       └ {}
           │             └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ <bound method Block.forward of Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2432, in __call__
    result = self._torchdynamo_orig_backend(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2150, in __call__
    result = self._inner_convert(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 735, in __call__
    result = _compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1919, in _compile
    guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
    │             │               │             │     │          └ Hooks(guard_export_fn=None, guard_fail_fn=None, guard_filter_fn=None)
    │             │               │             │     └ False
    │             │               │             └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
    │             │               └ <function _compile.<locals>.compile_inner at 0x7c32b56fbec0>
    │             └ <torch._dynamo.output_graph.DynamoTracerOutput object at 0x7c32b5708500>
    └ None
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_utils_internal.py", line 96, in wrapper_function
    return function(*args, **kwargs)
           │         │       └ {}
           │         └ (<code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", l...
           └ <function _compile.<locals>.compile_inner at 0x7c32b56fbe20>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1535, in compile_inner
    result = _compile_inner(code, one_graph, hooks)
             │              │     │          └ Hooks(guard_export_fn=None, guard_fail_fn=None, guard_filter_fn=None)
             │              │     └ False
             │              └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
             └ <function _compile.<locals>._compile_inner at 0x7c32b56fba60>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1594, in _compile_inner
    dynamo_output = compile_frame(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1442, in compile_frame
    bytecode, tracer_output = transform_code_object(code, transform)
                              │                     │     └ <function compile_frame.<locals>.transform at 0x7c32b570f560>
                              │                     └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
                              └ <function transform_code_object at 0x7c32b86fd6c0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1626, in transform_code_object
    tracer_output = transformations(instructions, code_options)
                    │               │             └ {'co_argcount': 2, 'co_posonlyargcount': 0, 'co_kwonlyargcount': 0, 'co_nlocals': 6, 'co_stacksize': 7, 'co_flags': 3, 'co_code'...
                    │               └ [Instruction(opcode=151, opname='RESUME', arg=0, argval=0, offset=0, starts_line=22, is_jump_target=False, positions=Positions(l...
                    └ <function compile_frame.<locals>.transform at 0x7c32b570f560>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1414, in transform
    tracer_output = trace_frame(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 342, in _fn
    return fn(*args, **kwargs)
           │   │       └ {'export': False, 'export_constraints': None, 'frame_state': {'_id': 0}, 'distributed_state': None, 'package': None}
           │   └ (<code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", l...
           └ <function trace_frame at 0x7c32b7808680>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 875, in trace_frame
    run_tracer()
    └ <function trace_frame.<locals>.run_tracer at 0x7c32b56fa480>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 856, in run_tracer
    tracer.run()
    └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1705, in run
    while self.step():
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1372, in step
    self.dispatch_table[inst.opcode](self, inst)
    │                   │            │     └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
    │                   │            └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
    │                   └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
    └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 5035, in RETURN_VALUE
    self._return(inst)
    │            └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
    └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 5017, in _return
    all_stack_locals_metadata = self.output.compile_subgraph(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2053, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
    │                              │   │                          └ FakeRootModule(...)
    │                              │   └ <torch._dynamo.codegen.PyCodegen object at 0x7c32b5510b00>
    │                              └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
    └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2700, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm, self.example_inputs())
                  │                       │   └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
                  │                       └ GraphModule()
                  └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2867, in call_user_compiler
    return self._call_user_compiler(gm, example_inputs)
           │                        │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           │                        └ GraphModule()
           └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2925, in _call_user_compiler
    compiled_fn = compiler_fn(gm, example_inputs)
                  │           │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
                  │           └ GraphModule()
                  └ <torch._dynamo.repro.after_dynamo.WrapBackendDebug object at 0x7c32b7ec4260>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  │           │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
                  │           └ GraphModule()
                  └ functools.partial(<torch._TorchCompileInductorWrapper object at 0x7c34f74431d0>)
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/__init__.py", line 2477, in __call__
    return compile_fx(model_, inputs_, config_patches=all_patches)
           │          │       │                       └ {}
           │          │       └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           │          └ GraphModule()
           └ <function compile_fx at 0x7c32b56f80e0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2682, in compile_fx
    return _maybe_wrap_and_compile_fx_main(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2760, in _maybe_wrap_and_compile_fx_main
    return _compile_fx_main(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2972, in _compile_fx_main
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2957, in _compile_fx_main
    return aot_autograd(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
         │                     │   │                 └ <torch._dynamo.backends.common.AotAutograd object at 0x7c32b546dbe0>
         │                     │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
         │                     └ GraphModule()
         └ <function aot_module_simplified at 0x7c32b67244a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1147, in aot_module_simplified
    compiled_fn, _ = aot_stage2_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 385, in aot_stage2_compile
    return aot_stage2_autograd(aot_state, aot_graph_capture)
           │                   │          └ AOTGraphCapture(wrappers=[AOTDedupeWrapper(keep_arg_mask=[], add_dupe_map=[], old_input_metadata=[], needs_post_compile=False), ...
           │                   └ AOTState(needs_autograd=True, flat_args=[FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), Parameter(...
           └ <function aot_stage2_autograd at 0x7c32b7020540>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2219, in aot_stage2_autograd
    fwd_output_strides, compiled_fw_func = _aot_stage2b_fw_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2008, in _aot_stage2b_fw_compile
    return _aot_stage2b_compile_forward_or_inference(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2536, in _aot_stage2b_compile_forward_or_inference
    compiled_fw_func = compiler(fw_module, adjusted_flat_args)
                       │        │          └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                       │        └ GraphModule()
                       └ <torch._functorch._aot_autograd.schemas.SerializableAOTDispatchCompiler object at 0x7c32b546f9b0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1416, in __call__
    output_code = self.compiler_fn(gm, example_inputs)
                  │                │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                  │                └ GraphModule()
                  └ <torch._functorch._aot_autograd.schemas.SerializableAOTDispatchCompiler object at 0x7c32b546f9b0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2820, in fw_compiler_base
    return compile_fx_forward(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2461, in compile_fx_forward
    result = inner_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 826, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 309, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        │           │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                        │           └ GraphModule()
                        └ functools.partial(<function _compile_fx_inner at 0x7c32b56f2de0>, get_decomp_fn=<function select_decomp_table at 0x7c32b5df3ce0>...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1802, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           │                          │   │               │                └ {'get_decomp_fn': <function select_decomp_table at 0x7c32b5df3ce0>, 'static_input_idxs': [1, 2, 3, 4, 5], 'cudagraphs': BoxedBoo...
           │                          │   │               └ [0]
           │                          │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
           │                          └ GraphModule()
           └ <torch._inductor.compile_fx._InProcessFxCompile object at 0x7c32aeaea720>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1489, in codegen_and_compile
    graph.run(*example_inputs)
    │          └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
    └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1051, in run
    return super().run(*args)
                        └ (FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/fx/interpreter.py", line 200, in run
    self.env[node] = self.run_node(node)
    │        │       │             └ index_add
    │        │       └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
    │        └ index_add
    └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1896, in run_node
    result = super().run_node(n)
                              └ index_add
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/fx/interpreter.py", line 297, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
                   │     │     │         │     └ {}
                   │     │     │         └ (TensorBox(StorageBox(
  InputBuffer(name='primals_1', layout=FixedLayout('cuda:0', torch.bfloat16, size=[16, 197, 256], stride=...
                   │     │     └ index_add
                   │     └ index_add
                   └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1336, in call_function
    make_fallback(target, warn=False, get_decomp_fn=self.get_decomp_fn)
    │             │                                 └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
    │             └ <OpOverload(op='aten.index_add', overload='default')>
    └ <function make_fallback at 0x7c32b5dfe200>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/lowering.py", line 2511, in make_fallback
    assert op not in check_decomps or override_decomp, (
torch._inductor.exc.InductorError: AssertionError: both a fallback and a decomp for same op: aten.index_add.default

Workarounds

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8462Y+ CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 Stepping: 8 CPU max MHz: 4100.0000 CPU min MHz: 800.0000 BogoMIPS: 5600.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 3 MiB (64 instances) L1i cache: 2 MiB (64 instances) L2 cache: 128 MiB (64 instances) L3 cache: 120 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Code Example

torch._inductor.exc.InductorError: AssertionError: both a fallback and a decomp for same op: aten.index_add.default

---

assert op not in check_decomps or override_decomp, (
    f"both a fallback and a decomp for same op: {op}"
)

---

python repro.py

---

import torch
import torch.nn as nn


class Block(nn.Module):
    def __init__(self, dim=256, drop=0.1):
        super().__init__()
        self.norm = nn.LayerNorm(dim)
        self.proj = nn.Linear(dim, dim)
        self.gamma = nn.Parameter(1e-5 * torch.ones(dim))
        self.drop = drop

    def forward(self, x):
        B = x.shape[0]
        k = max(int(B * (1 - self.drop)), 1)
        idx = torch.randperm(B, device=x.device)[:k]
        res = self.proj(self.norm(x[idx])) * self.gamma
        return torch.index_add(x, 0, idx, res)


device = "cuda"
model = nn.Sequential(*[Block() for _ in range(4)])
DTYPE = torch.bfloat16
model = model.to(device, dtype=DTYPE).train()

for m in model:
    m.compile()

x = torch.randn(16, 197, 256, device=device, dtype=DTYPE)
model(x)  # crashes here

print("Success")

---

Traceback (most recent call last):
  File "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", line 39, in <module>
    model(x)  # crashes here
    │     └ tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
    (proj): Linear(in_fea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           │                │       └ {}
 (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
    (proj): Linear(in_fea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           │             │       └ {}
 (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
<bound method Sequential.forward of Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, ...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/container.py", line 253, in forward
    input = module(input)
    │       │      └ tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
    │       └ Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linear(in_features=256, out_features...
tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
           │                         │       └ {}
 (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linear(in_features=256, out_features...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1050, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1036, in compile_wrapper
    return fn(*args, **kwargs)
           │   │       └ {}
 (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
<bound method Module._call_impl of Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): L...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           │             │       └ {}
 (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
<bound method Block.forward of Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2432, in __call__
    result = self._torchdynamo_orig_backend(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2150, in __call__
    result = self._inner_convert(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 735, in __call__
    result = _compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1919, in _compile
    guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
    │             │               │             │     │          └ Hooks(guard_export_fn=None, guard_fail_fn=None, guard_filter_fn=None)
    │             │               │             │     └ False
    │             │               │             └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
    │             │               └ <function _compile.<locals>.compile_inner at 0x7c32b56fbec0>
    │             └ <torch._dynamo.output_graph.DynamoTracerOutput object at 0x7c32b5708500>
None
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_utils_internal.py", line 96, in wrapper_function
    return function(*args, **kwargs)
           │         │       └ {}
 (<code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", l...
<function _compile.<locals>.compile_inner at 0x7c32b56fbe20>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1535, in compile_inner
    result = _compile_inner(code, one_graph, hooks)
             │              │     │          └ Hooks(guard_export_fn=None, guard_fail_fn=None, guard_filter_fn=None)
             │              │     └ False
             │              └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
<function _compile.<locals>._compile_inner at 0x7c32b56fba60>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1594, in _compile_inner
    dynamo_output = compile_frame(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1442, in compile_frame
    bytecode, tracer_output = transform_code_object(code, transform)
                              │                     │     └ <function compile_frame.<locals>.transform at 0x7c32b570f560>
                              │                     └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
<function transform_code_object at 0x7c32b86fd6c0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1626, in transform_code_object
    tracer_output = transformations(instructions, code_options)
                    │               │             └ {'co_argcount': 2, 'co_posonlyargcount': 0, 'co_kwonlyargcount': 0, 'co_nlocals': 6, 'co_stacksize': 7, 'co_flags': 3, 'co_code'...
                    │               └ [Instruction(opcode=151, opname='RESUME', arg=0, argval=0, offset=0, starts_line=22, is_jump_target=False, positions=Positions(l...
<function compile_frame.<locals>.transform at 0x7c32b570f560>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1414, in transform
    tracer_output = trace_frame(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 342, in _fn
    return fn(*args, **kwargs)
           │   │       └ {'export': False, 'export_constraints': None, 'frame_state': {'_id': 0}, 'distributed_state': None, 'package': None}
 (<code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", l...
<function trace_frame at 0x7c32b7808680>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 875, in trace_frame
    run_tracer()
<function trace_frame.<locals>.run_tracer at 0x7c32b56fa480>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 856, in run_tracer
    tracer.run()
<torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1705, in run
    while self.step():
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1372, in step
    self.dispatch_table[inst.opcode](self, inst)
    │                   │            │     └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
    │                   │            └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
    │                   └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
<torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 5035, in RETURN_VALUE
    self._return(inst)
    │            └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
<torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 5017, in _return
    all_stack_locals_metadata = self.output.compile_subgraph(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2053, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
    │                              │   │                          └ FakeRootModule(...)
    │                              │   └ <torch._dynamo.codegen.PyCodegen object at 0x7c32b5510b00>
    │                              └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2700, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm, self.example_inputs())
                  │                       │   └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
                  │                       └ GraphModule()
OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2867, in call_user_compiler
    return self._call_user_compiler(gm, example_inputs)
           │                        │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           │                        └ GraphModule()
OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2925, in _call_user_compiler
    compiled_fn = compiler_fn(gm, example_inputs)
                  │           │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
                  │           └ GraphModule()
<torch._dynamo.repro.after_dynamo.WrapBackendDebug object at 0x7c32b7ec4260>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  │           │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
                  │           └ GraphModule()
                  └ functools.partial(<torch._TorchCompileInductorWrapper object at 0x7c34f74431d0>)
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/__init__.py", line 2477, in __call__
    return compile_fx(model_, inputs_, config_patches=all_patches)
           │          │       │                       └ {}
           │          │       └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           │          └ GraphModule()
<function compile_fx at 0x7c32b56f80e0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2682, in compile_fx
    return _maybe_wrap_and_compile_fx_main(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2760, in _maybe_wrap_and_compile_fx_main
    return _compile_fx_main(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2972, in _compile_fx_main
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2957, in _compile_fx_main
    return aot_autograd(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
         │                     │   │                 └ <torch._dynamo.backends.common.AotAutograd object at 0x7c32b546dbe0>
         │                     │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
         │                     └ GraphModule()
<function aot_module_simplified at 0x7c32b67244a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1147, in aot_module_simplified
    compiled_fn, _ = aot_stage2_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 385, in aot_stage2_compile
    return aot_stage2_autograd(aot_state, aot_graph_capture)
           │                   │          └ AOTGraphCapture(wrappers=[AOTDedupeWrapper(keep_arg_mask=[], add_dupe_map=[], old_input_metadata=[], needs_post_compile=False), ...
           │                   └ AOTState(needs_autograd=True, flat_args=[FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), Parameter(...
<function aot_stage2_autograd at 0x7c32b7020540>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2219, in aot_stage2_autograd
    fwd_output_strides, compiled_fw_func = _aot_stage2b_fw_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2008, in _aot_stage2b_fw_compile
    return _aot_stage2b_compile_forward_or_inference(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2536, in _aot_stage2b_compile_forward_or_inference
    compiled_fw_func = compiler(fw_module, adjusted_flat_args)
                       │        │          └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                       │        └ GraphModule()
<torch._functorch._aot_autograd.schemas.SerializableAOTDispatchCompiler object at 0x7c32b546f9b0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1416, in __call__
    output_code = self.compiler_fn(gm, example_inputs)
                  │                │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                  │                └ GraphModule()
<torch._functorch._aot_autograd.schemas.SerializableAOTDispatchCompiler object at 0x7c32b546f9b0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2820, in fw_compiler_base
    return compile_fx_forward(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2461, in compile_fx_forward
    result = inner_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 826, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 309, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        │           │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                        │           └ GraphModule()
                        └ functools.partial(<function _compile_fx_inner at 0x7c32b56f2de0>, get_decomp_fn=<function select_decomp_table at 0x7c32b5df3ce0>...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1802, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           │                          │   │               │                └ {'get_decomp_fn': <function select_decomp_table at 0x7c32b5df3ce0>, 'static_input_idxs': [1, 2, 3, 4, 5], 'cudagraphs': BoxedBoo...
           │                          │   │               └ [0]
           │                          │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
           │                          └ GraphModule()
<torch._inductor.compile_fx._InProcessFxCompile object at 0x7c32aeaea720>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1489, in codegen_and_compile
    graph.run(*example_inputs)
    │          └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
<torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1051, in run
    return super().run(*args)
                         (FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/fx/interpreter.py", line 200, in run
    self.env[node] = self.run_node(node)
    │        │       │             └ index_add
    │        │       └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
    │        └ index_add
<torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1896, in run_node
    result = super().run_node(n)
                              └ index_add
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/fx/interpreter.py", line 297, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
                   │     │     │         │     └ {}
                   │     │     │          (TensorBox(StorageBox(
  InputBuffer(name='primals_1', layout=FixedLayout('cuda:0', torch.bfloat16, size=[16, 197, 256], stride=...
                   │     │     └ index_add
                   │     └ index_add
<torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1336, in call_function
    make_fallback(target, warn=False, get_decomp_fn=self.get_decomp_fn)
    │             │                                 └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
    │             └ <OpOverload(op='aten.index_add', overload='default')>
<function make_fallback at 0x7c32b5dfe200>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/lowering.py", line 2511, in make_fallback
    assert op not in check_decomps or override_decomp, (
torch._inductor.exc.InductorError: AssertionError: both a fallback and a decomp for same op: aten.index_add.default
RAW_BUFFERClick to expand / collapse

The bug

torch.compile crashes with an InductorError when a compiled bfloat16 module uses torch.index_add (functional). The inductor finds both a fallback handler and a decomposition registered for aten.index_add.default.

torch._inductor.exc.InductorError: AssertionError: both a fallback and a decomp for same op: aten.index_add.default

The conflict is in torch/_inductor/lowering.py, make_fallback:

assert op not in check_decomps or override_decomp, (
    f"both a fallback and a decomp for same op: {op}"
)

Env

  • PyTorch: nightly 20260404+cu128 (pip install --pre torch)
  • Python: 3.12
  • CUDA: bundled with nightly
  • OS: Linux 6.5.13 (x86_64)
  • GPU: NVIDIA H100

To reproduce

On a single GPU

python repro.py
import torch
import torch.nn as nn


class Block(nn.Module):
    def __init__(self, dim=256, drop=0.1):
        super().__init__()
        self.norm = nn.LayerNorm(dim)
        self.proj = nn.Linear(dim, dim)
        self.gamma = nn.Parameter(1e-5 * torch.ones(dim))
        self.drop = drop

    def forward(self, x):
        B = x.shape[0]
        k = max(int(B * (1 - self.drop)), 1)
        idx = torch.randperm(B, device=x.device)[:k]
        res = self.proj(self.norm(x[idx])) * self.gamma
        return torch.index_add(x, 0, idx, res)


device = "cuda"
model = nn.Sequential(*[Block() for _ in range(4)])
DTYPE = torch.bfloat16
model = model.to(device, dtype=DTYPE).train()

for m in model:
    m.compile()

x = torch.randn(16, 197, 256, device=device, dtype=DTYPE)
model(x)  # crashes here

print("Success")

Required conditions (verified by ablation)

  1. torch.index_add (functional)
  2. torch.compile (per-module blk.compile())
  3. bfloat16 dtype

Not required (verified): FSDP/distributed, alpha parameter, automatic_dynamic_shapes = False, backward pass. Does not crash with float32.

Full traceback (from real-world ViT-L training)

Traceback (most recent call last):
  File "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", line 39, in <module>
    model(x)  # crashes here
    │     └ tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
    └ Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
    (proj): Linear(in_fea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           │                │       └ {}
           │                └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
    (proj): Linear(in_fea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           │             │       └ {}
           │             └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ <bound method Sequential.forward of Sequential(
  (0): Block(
    (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, ...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/container.py", line 253, in forward
    input = module(input)
    │       │      └ tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
    │       └ Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linear(in_features=256, out_features...
    └ tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.50...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
           │                         │       └ {}
           │                         └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linear(in_features=256, out_features...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1050, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1036, in compile_wrapper
    return fn(*args, **kwargs)
           │   │       └ {}
           │   └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ <bound method Module._call_impl of Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): L...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           │             │       └ {}
           │             └ (tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           └ <bound method Block.forward of Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Linea...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2432, in __call__
    result = self._torchdynamo_orig_backend(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2150, in __call__
    result = self._inner_convert(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 735, in __call__
    result = _compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1919, in _compile
    guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
    │             │               │             │     │          └ Hooks(guard_export_fn=None, guard_fail_fn=None, guard_filter_fn=None)
    │             │               │             │     └ False
    │             │               │             └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
    │             │               └ <function _compile.<locals>.compile_inner at 0x7c32b56fbec0>
    │             └ <torch._dynamo.output_graph.DynamoTracerOutput object at 0x7c32b5708500>
    └ None
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_utils_internal.py", line 96, in wrapper_function
    return function(*args, **kwargs)
           │         │       └ {}
           │         └ (<code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", l...
           └ <function _compile.<locals>.compile_inner at 0x7c32b56fbe20>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1535, in compile_inner
    result = _compile_inner(code, one_graph, hooks)
             │              │     │          └ Hooks(guard_export_fn=None, guard_fail_fn=None, guard_filter_fn=None)
             │              │     └ False
             │              └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
             └ <function _compile.<locals>._compile_inner at 0x7c32b56fba60>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1594, in _compile_inner
    dynamo_output = compile_frame(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1442, in compile_frame
    bytecode, tracer_output = transform_code_object(code, transform)
                              │                     │     └ <function compile_frame.<locals>.transform at 0x7c32b570f560>
                              │                     └ <code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", li...
                              └ <function transform_code_object at 0x7c32b86fd6c0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1626, in transform_code_object
    tracer_output = transformations(instructions, code_options)
                    │               │             └ {'co_argcount': 2, 'co_posonlyargcount': 0, 'co_kwonlyargcount': 0, 'co_nlocals': 6, 'co_stacksize': 7, 'co_flags': 3, 'co_code'...
                    │               └ [Instruction(opcode=151, opname='RESUME', arg=0, argval=0, offset=0, starts_line=22, is_jump_target=False, positions=Positions(l...
                    └ <function compile_frame.<locals>.transform at 0x7c32b570f560>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1414, in transform
    tracer_output = trace_frame(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 342, in _fn
    return fn(*args, **kwargs)
           │   │       └ {'export': False, 'export_constraints': None, 'frame_state': {'_id': 0}, 'distributed_state': None, 'package': None}
           │   └ (<code object forward at 0x6227fd74f340, file "/storage/home/huyvvo/fairvit_balanced_patches/repro_index_add_inductor_bug.py", l...
           └ <function trace_frame at 0x7c32b7808680>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 875, in trace_frame
    run_tracer()
    └ <function trace_frame.<locals>.run_tracer at 0x7c32b56fa480>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 856, in run_tracer
    tracer.run()
    └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1705, in run
    while self.step():
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1372, in step
    self.dispatch_table[inst.opcode](self, inst)
    │                   │            │     └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
    │                   │            └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
    │                   └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
    └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 5035, in RETURN_VALUE
    self._return(inst)
    │            └ Instruction(opcode=83, opname='RETURN_VALUE', arg=None, argval=None, offset=324, starts_line=27, is_jump_target=False, positions...
    └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 5017, in _return
    all_stack_locals_metadata = self.output.compile_subgraph(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2053, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
    │                              │   │                          └ FakeRootModule(...)
    │                              │   └ <torch._dynamo.codegen.PyCodegen object at 0x7c32b5510b00>
    │                              └ <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7c32b570a2a0>
    └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2700, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm, self.example_inputs())
                  │                       │   └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
                  │                       └ GraphModule()
                  └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2867, in call_user_compiler
    return self._call_user_compiler(gm, example_inputs)
           │                        │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           │                        └ GraphModule()
           └ OutputGraph(local_scope={'self': Block(
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True, bias=True)
  (proj): Lin...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2925, in _call_user_compiler
    compiled_fn = compiler_fn(gm, example_inputs)
                  │           │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
                  │           └ GraphModule()
                  └ <torch._dynamo.repro.after_dynamo.WrapBackendDebug object at 0x7c32b7ec4260>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  │           │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
                  │           └ GraphModule()
                  └ functools.partial(<torch._TorchCompileInductorWrapper object at 0x7c34f74431d0>)
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/__init__.py", line 2477, in __call__
    return compile_fx(model_, inputs_, config_patches=all_patches)
           │          │       │                       └ {}
           │          │       └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
           │          └ GraphModule()
           └ <function compile_fx at 0x7c32b56f80e0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2682, in compile_fx
    return _maybe_wrap_and_compile_fx_main(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2760, in _maybe_wrap_and_compile_fx_main
    return _compile_fx_main(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2972, in _compile_fx_main
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2957, in _compile_fx_main
    return aot_autograd(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
         │                     │   │                 └ <torch._dynamo.backends.common.AotAutograd object at 0x7c32b546dbe0>
         │                     │   └ [tensor([[[ 0.1855,  0.7305,  0.9844,  ...,  0.4453, -0.3379, -0.1377],
         [-0.3633,  0.2051, -0.6250,  ..., -2.2812,  0.5...
         │                     └ GraphModule()
         └ <function aot_module_simplified at 0x7c32b67244a0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1147, in aot_module_simplified
    compiled_fn, _ = aot_stage2_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 385, in aot_stage2_compile
    return aot_stage2_autograd(aot_state, aot_graph_capture)
           │                   │          └ AOTGraphCapture(wrappers=[AOTDedupeWrapper(keep_arg_mask=[], add_dupe_map=[], old_input_metadata=[], needs_post_compile=False), ...
           │                   └ AOTState(needs_autograd=True, flat_args=[FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), Parameter(...
           └ <function aot_stage2_autograd at 0x7c32b7020540>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2219, in aot_stage2_autograd
    fwd_output_strides, compiled_fw_func = _aot_stage2b_fw_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2008, in _aot_stage2b_fw_compile
    return _aot_stage2b_compile_forward_or_inference(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2536, in _aot_stage2b_compile_forward_or_inference
    compiled_fw_func = compiler(fw_module, adjusted_flat_args)
                       │        │          └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                       │        └ GraphModule()
                       └ <torch._functorch._aot_autograd.schemas.SerializableAOTDispatchCompiler object at 0x7c32b546f9b0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1416, in __call__
    output_code = self.compiler_fn(gm, example_inputs)
                  │                │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                  │                └ GraphModule()
                  └ <torch._functorch._aot_autograd.schemas.SerializableAOTDispatchCompiler object at 0x7c32b546f9b0>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2820, in fw_compiler_base
    return compile_fx_forward(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2461, in compile_fx_forward
    result = inner_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 826, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 309, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        │           │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
                        │           └ GraphModule()
                        └ functools.partial(<function _compile_fx_inner at 0x7c32b56f2de0>, get_decomp_fn=<function select_decomp_table at 0x7c32b5df3ce0>...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1802, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           │                          │   │               │                └ {'get_decomp_fn': <function select_decomp_table at 0x7c32b5df3ce0>, 'static_input_idxs': [1, 2, 3, 4, 5], 'cudagraphs': BoxedBoo...
           │                          │   │               └ [0]
           │                          │   └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
           │                          └ GraphModule()
           └ <torch._inductor.compile_fx._InProcessFxCompile object at 0x7c32aeaea720>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1489, in codegen_and_compile
    graph.run(*example_inputs)
    │          └ [FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
    └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1051, in run
    return super().run(*args)
                        └ (FakeTensor(..., device='cuda:0', size=(16, 197, 256), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(256,), dtyp...
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/fx/interpreter.py", line 200, in run
    self.env[node] = self.run_node(node)
    │        │       │             └ index_add
    │        │       └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
    │        └ index_add
    └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1896, in run_node
    result = super().run_node(n)
                              └ index_add
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/fx/interpreter.py", line 297, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
                   │     │     │         │     └ {}
                   │     │     │         └ (TensorBox(StorageBox(
  InputBuffer(name='primals_1', layout=FixedLayout('cuda:0', torch.bfloat16, size=[16, 197, 256], stride=...
                   │     │     └ index_add
                   │     └ index_add
                   └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1336, in call_function
    make_fallback(target, warn=False, get_decomp_fn=self.get_decomp_fn)
    │             │                                 └ <torch._inductor.graph.GraphLowering object at 0x7c32aeaea150>
    │             └ <OpOverload(op='aten.index_add', overload='default')>
    └ <function make_fallback at 0x7c32b5dfe200>
  File "/storage/home/huyvvo/.local/share/mamba/envs/fairvit-py312-ptnightly-xformers-20260404/lib/python3.12/site-packages/torch/_inductor/lowering.py", line 2511, in make_fallback
    assert op not in check_decomps or override_decomp, (
torch._inductor.exc.InductorError: AssertionError: both a fallback and a decomp for same op: aten.index_add.default

Workarounds

I am currently setting torch._dynamo.config.suppress_errors = True to avoid the error.

Expected behavior

torch.compile should handle aten.index_add.default with bfloat16 tensors without conflict — either use the decomposition or the fallback, not both.

Versions

PyTorch version: 2.12.0.dev20260404+cu128 Is debug build: False CUDA used to build PyTorch: 12.8 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (conda-forge gcc 12.4.0-2) 12.4.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.12.12 | packaged by conda-forge | (main, Oct 13 2025, 14:34:15) [GCC 14.3.0] (64-bit runtime) Python platform: Linux-6.8.12-680-6063-coreweave-amd64-f81899c8-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.4.131 CUDA_MODULE_LOADING set to: GPU models and configuration: GPU 0: NVIDIA H100 80GB HBM3 GPU 1: NVIDIA H100 80GB HBM3

Nvidia driver version: 580.95.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.9.8.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.8.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.8.0 /usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.8.0 /usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.8.0 /usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.8.0 /usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.8.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.8.0 Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8462Y+ CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 Stepping: 8 CPU max MHz: 4100.0000 CPU min MHz: 800.0000 BogoMIPS: 5600.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 3 MiB (64 instances) L1i cache: 2 MiB (64 instances) L2 cache: 128 MiB (64 instances) L3 cache: 120 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] lovely-numpy==0.2.20 [pip3] mypy==1.20.0 [pip3] mypy_extensions==1.1.0 [pip3] numpy==2.2.6 [pip3] nvidia-cublas-cu12==12.8.4.1 [pip3] nvidia-cuda-cupti-cu12==12.8.90 [pip3] nvidia-cuda-nvrtc-cu12==12.8.93 [pip3] nvidia-cuda-runtime-cu12==12.8.90 [pip3] nvidia-cudnn-cu12==9.20.0.48 [pip3] nvidia-cufft-cu12==11.3.3.83 [pip3] nvidia-curand-cu12==10.3.9.90 [pip3] nvidia-cusolver-cu12==11.7.3.90 [pip3] nvidia-cusparse-cu12==12.5.8.93 [pip3] nvidia-cusparselt-cu12==0.7.1 [pip3] nvidia-nccl-cu12==2.29.7 [pip3] nvidia-nvjitlink-cu12==12.8.93 [pip3] nvidia-nvtx-cu12==12.8.90 [pip3] nvtx==0.2.15 [pip3] open_clip_torch==3.3.0 [pip3] optree==0.19.0 [pip3] pytorch-triton==3.6.0+git8fedd49b [pip3] tbb==2022.3.1 [pip3] tcmlib==1.4.1 [pip3] torch==2.12.0.dev20260404+cu128 [pip3] torchaudio==2.11.0.dev20260402+cu128 [pip3] torchcodec==0.12.0.dev20260404+cu128 [pip3] torchmetrics==1.9.0 [pip3] torchvision==0.27.0.dev20260404+cu128 [pip3] triton==3.7.0+git9c288bc5 [conda] cuda-cudart 12.4.127 he02047a_2 conda-forge [conda] cuda-cudart-dev 12.4.127 he02047a_2 conda-forge [conda] cuda-cudart-dev_linux-64 12.4.127 h85509e4_2 conda-forge [conda] cuda-cudart-static 12.4.127 he02047a_2 conda-forge [conda] cuda-cudart-static_linux-64 12.4.127 h85509e4_2 conda-forge [conda] cuda-cudart_linux-64 12.4.127 h85509e4_2 conda-forge [conda] cuda-cupti 12.4.127 he02047a_2 conda-forge [conda] cuda-cupti-dev 12.4.127 he02047a_2 conda-forge [conda] cuda-libraries 12.4.1 ha770c72_1 conda-forge [conda] cuda-libraries-dev 12.4.1 ha770c72_1 conda-forge [conda] cuda-nvrtc 12.4.127 he02047a_2 conda-forge [conda] cuda-nvrtc-dev 12.4.127 he02047a_2 conda-forge [conda] cuda-nvtx 12.4.127 he02047a_2 conda-forge [conda] cuda-opencl 12.4.127 he02047a_1 conda-forge [conda] cuda-opencl-dev 12.4.127 he02047a_1 conda-forge [conda] cuda-runtime 12.4.1 ha804496_0 conda-forge [conda] libcublas 12.4.5.8 he02047a_2 conda-forge [conda] libcublas-dev 12.4.5.8 he02047a_2 conda-forge [conda] libcufft 11.2.1.3 he02047a_2 conda-forge [conda] libcufft-dev 11.2.1.3 he02047a_2 conda-forge [conda] libcurand 10.3.5.147 he02047a_2 conda-forge [conda] libcurand-dev 10.3.5.147 he02047a_2 conda-forge [conda] libcusolver 11.6.1.9 he02047a_2 conda-forge [conda] libcusolver-dev 11.6.1.9 he02047a_2 conda-forge [conda] libcusparse 12.3.1.170 he02047a_2 conda-forge [conda] libcusparse-dev 12.3.1.170 he02047a_2 conda-forge [conda] libnvjitlink 12.4.127 he02047a_2 conda-forge [conda] libnvjitlink-dev 12.4.127 he02047a_2 conda-forge [conda] libopenvino-pytorch-frontend 2024.1.0 he02047a_7 conda-forge [conda] lovely-numpy 0.2.20 pypi_0 pypi [conda] numpy 2.2.6 pypi_0 pypi [conda] nvidia-cublas-cu12 12.8.4.1 pypi_0 pypi [conda] nvidia-cuda-cupti-cu12 12.8.90 pypi_0 pypi [conda] nvidia-cuda-nvrtc-cu12 12.8.93 pypi_0 pypi [conda] nvidia-cuda-runtime-cu12 12.8.90 pypi_0 pypi [conda] nvidia-cudnn-cu12 9.20.0.48 pypi_0 pypi [conda] nvidia-cufft-cu12 11.3.3.83 pypi_0 pypi [conda] nvidia-curand-cu12 10.3.9.90 pypi_0 pypi [conda] nvidia-cusolver-cu12 11.7.3.90 pypi_0 pypi [conda] nvidia-cusparse-cu12 12.5.8.93 pypi_0 pypi [conda] nvidia-cusparselt-cu12 0.7.1 pypi_0 pypi [conda] nvidia-nccl-cu12 2.29.7 pypi_0 pypi [conda] nvidia-nvjitlink-cu12 12.8.93 pypi_0 pypi [conda] nvidia-nvtx-cu12 12.8.90 pypi_0 pypi [conda] nvtx 0.2.15 pypi_0 pypi [conda] open-clip-torch 3.3.0 pypi_0 pypi [conda] optree 0.19.0 pypi_0 pypi [conda] pytorch-triton 3.6.0+git8fedd49b pypi_0 pypi [conda] tbb 2022.3.1 pypi_0 pypi [conda] tcmlib 1.4.1 pypi_0 pypi [conda] torch 2.12.0.dev20260404+cu128 pypi_0 pypi [conda] torchaudio 2.11.0.dev20260402+cu128 pypi_0 pypi [conda] torchcodec 0.12.0.dev20260404+cu128 pypi_0 pypi [conda] torchmetrics 1.9.0 pypi_0 pypi [conda] torchvision 0.27.0.dev20260404+cu128 pypi_0 pypi [conda] triton 3.7.0+git9c288bc5 pypi_0 pypi

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The most likely fix is to set torch._dynamo.config.verify_correctness = False or update PyTorch to a version where this issue is resolved, as the current error is due to a conflict between a fallback and a decomposition for the aten.index_add.default operation.

Guidance

  • The error occurs because both a fallback and a decomposition are registered for the aten.index_add.default operation, causing a conflict.
  • Setting torch._dynamo.config.suppress_errors = True can avoid the error, but it may not be the best solution as it suppresses all errors.
  • Setting torch._dynamo.config.verify_correctness = False may also resolve the issue, but it disables the correctness checks.
  • Updating PyTorch to a version where this issue is resolved may be the best solution.
  • The issue is specific to bfloat16 tensors and torch.compile with torch.index_add (functional).

Example

No code example is provided as the issue is related to a specific PyTorch version and configuration.

Notes

  • The issue is specific to PyTorch version 2.12.0.dev20260404+cu128 and may be resolved in later versions.
  • The torch._dynamo.config.suppress_errors = True workaround may have unintended consequences and should be used with caution.

Recommendation

Apply the workaround by setting torch._dynamo.config.verify_correctness = False until a fixed version of PyTorch is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

torch.compile should handle aten.index_add.default with bfloat16 tensors without conflict — either use the decomposition or the fallback, not both.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [torch.compile] InductorError: both a fallback and a decomp for `aten.index_add.default` with bfloat16 [2 comments, 2 participants]