pytorch - 💡(How to fix) Fix Inductor failure with `view_as_complex` in fused Multi-Head Attention subgraph [1 participants]

pytorch2026-03-10 03:31:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#176986•Fetched 2026-04-08 00:23:18

View on GitHub

Comments

Participants

Timeline

145

Reactions

Author

psiwho

Participants

psiwho

Timeline (top)

subscribed ×70mentioned ×69labeled ×5cross-referenced ×1

torch.compile with the Inductor backend raises a RuntimeError: Tensor must have a stride divisible by 2 for all but last dimension when attempting to fuse a subgraph containing Multi-Head Rotary Positional Embeddings (RoPE) and a subsequent matmul.

The error specifically occurs when two parallel complex-valued operation chains (representing Q and K rotations) are fused into a consumer matmul. Eager mode handles the same inputs correctly.

Error Message

import torch import torch.nn as nn

class InductorBugRepro(nn.Module): def init(self, d_model): super().init() self.w_q = nn.Parameter(torch.randn(d_model, d_model)) self.w_k = nn.Parameter(torch.randn(d_model, d_model))

def forward(self, x, rope):
    B, L, _ = x.shape
    H, D = 16, 64
    
    # 1. Linear projection + Transpose (Standard MHSA pattern)
    q = (x @ self.w_q.T).view(B, L, H, D).transpose(1, 2)
    k = (x @ self.w_k.T).view(B, L, H, D).transpose(1, 2)
    
    # 2. Parallel complex op chains (RoPE pattern)
    # Each chain uses .contiguous() before view_as_complex
    q_rot = torch.view_as_real(torch.view_as_complex(q.view(*q.shape[:-1], -1, 2).contiguous()) * rope)
    k_rot = torch.view_as_real(torch.view_as_complex(k.view(*k.shape[:-1], -1, 2).contiguous()) * rope)
    
    # 3. Matmul consumer forces fusion of the RoPE outputs
    return torch.matmul(q_rot.view(*q.shape), k_rot.view(*k.shape).transpose(-1, -2))

def run(): device = "cuda" B, H, L, D = 1, 16, 1, 64 d_model = H * D

model = InductorBugRepro(d_model).to(device)
compiled_model = torch.compile(model)

x = torch.randn(B, L, d_model, device=device)
rope = torch.randn(L, D // 2, dtype=torch.complex64, device=device)

print(f"Running reproduction on {device}...")

# Passes
_ = model(x, rope)
print("Eager mode: Success")

# Fails
try:
    _ = compiled_model(x, rope)
    print("Compiled mode: Success")
except Exception as e:
    print(f"Compiled mode: FAILED\n\nError:\n{e}")

if name == "main": run()

Root Cause

The error specifically occurs when two parallel complex-valued operation chains (representing Q and K rotations) are fused into a consumer matmul. Eager mode handles the same inputs correctly.

Fix Action

Fix / Workaround

Output Log

Running reproduction on cuda...
Eager mode: Success
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] failed while attempting to run meta for aten.view_as_complex.default
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] Traceback (most recent call last):
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]   File "/usr/local/lib/python3.12/dist-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]     r = func(*args, **kwargs)
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]         ^^^^^^^^^^^^^^^^^^^^^
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 819, in __call__
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]     return self._op(*args, **kwargs)
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]            ^^^^^^^^^^^^^^^^^^^^^^^^^
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] RuntimeError: Tensor must have a stride divisible by 2 for all but last dimension
Compiled mode: FAILED

Code Example

import torch
import torch.nn as nn

class InductorBugRepro(nn.Module):
    def __init__(self, d_model):
        super().__init__()
        self.w_q = nn.Parameter(torch.randn(d_model, d_model))
        self.w_k = nn.Parameter(torch.randn(d_model, d_model))

    def forward(self, x, rope):
        B, L, _ = x.shape
        H, D = 16, 64
        
        # 1. Linear projection + Transpose (Standard MHSA pattern)
        q = (x @ self.w_q.T).view(B, L, H, D).transpose(1, 2)
        k = (x @ self.w_k.T).view(B, L, H, D).transpose(1, 2)
        
        # 2. Parallel complex op chains (RoPE pattern)
        # Each chain uses .contiguous() before view_as_complex
        q_rot = torch.view_as_real(torch.view_as_complex(q.view(*q.shape[:-1], -1, 2).contiguous()) * rope)
        k_rot = torch.view_as_real(torch.view_as_complex(k.view(*k.shape[:-1], -1, 2).contiguous()) * rope)
        
        # 3. Matmul consumer forces fusion of the RoPE outputs
        return torch.matmul(q_rot.view(*q.shape), k_rot.view(*k.shape).transpose(-1, -2))

def run():
    device = "cuda"
    B, H, L, D = 1, 16, 1, 64
    d_model = H * D
    
    model = InductorBugRepro(d_model).to(device)
    compiled_model = torch.compile(model)
    
    x = torch.randn(B, L, d_model, device=device)
    rope = torch.randn(L, D // 2, dtype=torch.complex64, device=device)
    
    print(f"Running reproduction on {device}...")
    
    # Passes
    _ = model(x, rope)
    print("Eager mode: Success")
    
    # Fails
    try:
        _ = compiled_model(x, rope)
        print("Compiled mode: Success")
    except Exception as e:
        print(f"Compiled mode: FAILED\n\nError:\n{e}")

if __name__ == "__main__":
    run()

---

Running reproduction on cuda...
Eager mode: Success
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] failed while attempting to run meta for aten.view_as_complex.default
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] Traceback (most recent call last):
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]   File "/usr/local/lib/python3.12/dist-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]     r = func(*args, **kwargs)
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]         ^^^^^^^^^^^^^^^^^^^^^
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 819, in __call__
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]     return self._op(*args, **kwargs)
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]            ^^^^^^^^^^^^^^^^^^^^^^^^^
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] RuntimeError: Tensor must have a stride divisible by 2 for all but last dimension
Compiled mode: FAILED

Error:
backend='inductor' raised:
RuntimeError: Tensor must have a stride divisible by 2 for all but last dimension

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

/usr/local/lib/python3.12/dist-packages/torch/autograd/graph.py:865: UserWarning: Error detected in ViewAsRealBackward0. Traceback of forward call that caused the error:
  File "/tmp/ipykernel_463/4071400647.py", line 21, in forward
    k_rot = torch.view_as_real(torch.view_as_complex(k.view(*k.shape[:-1], -1, 2).contiguous()) * rope)
 (Triggered internally at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:122.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Summary

The error specifically occurs when two parallel complex-valued operation chains (representing Q and K rotations) are fused into a consumer matmul. Eager mode handles the same inputs correctly.

Minimal Reproduction Script

import torch
import torch.nn as nn

class InductorBugRepro(nn.Module):
    def __init__(self, d_model):
        super().__init__()
        self.w_q = nn.Parameter(torch.randn(d_model, d_model))
        self.w_k = nn.Parameter(torch.randn(d_model, d_model))

    def forward(self, x, rope):
        B, L, _ = x.shape
        H, D = 16, 64
        
        # 1. Linear projection + Transpose (Standard MHSA pattern)
        q = (x @ self.w_q.T).view(B, L, H, D).transpose(1, 2)
        k = (x @ self.w_k.T).view(B, L, H, D).transpose(1, 2)
        
        # 2. Parallel complex op chains (RoPE pattern)
        # Each chain uses .contiguous() before view_as_complex
        q_rot = torch.view_as_real(torch.view_as_complex(q.view(*q.shape[:-1], -1, 2).contiguous()) * rope)
        k_rot = torch.view_as_real(torch.view_as_complex(k.view(*k.shape[:-1], -1, 2).contiguous()) * rope)
        
        # 3. Matmul consumer forces fusion of the RoPE outputs
        return torch.matmul(q_rot.view(*q.shape), k_rot.view(*k.shape).transpose(-1, -2))

def run():
    device = "cuda"
    B, H, L, D = 1, 16, 1, 64
    d_model = H * D
    
    model = InductorBugRepro(d_model).to(device)
    compiled_model = torch.compile(model)
    
    x = torch.randn(B, L, d_model, device=device)
    rope = torch.randn(L, D // 2, dtype=torch.complex64, device=device)
    
    print(f"Running reproduction on {device}...")
    
    # Passes
    _ = model(x, rope)
    print("Eager mode: Success")
    
    # Fails
    try:
        _ = compiled_model(x, rope)
        print("Compiled mode: Success")
    except Exception as e:
        print(f"Compiled mode: FAILED\n\nError:\n{e}")

if __name__ == "__main__":
    run()

Output Log

Running reproduction on cuda...
Eager mode: Success
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] failed while attempting to run meta for aten.view_as_complex.default
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] Traceback (most recent call last):
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]   File "/usr/local/lib/python3.12/dist-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]     r = func(*args, **kwargs)
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]         ^^^^^^^^^^^^^^^^^^^^^
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 819, in __call__
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]     return self._op(*args, **kwargs)
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0]            ^^^^^^^^^^^^^^^^^^^^^^^^^
E0310 03:16:58.968000 463 torch/_subclasses/fake_tensor.py:2827] [1/0] RuntimeError: Tensor must have a stride divisible by 2 for all but last dimension
Compiled mode: FAILED

Error:
backend='inductor' raised:
RuntimeError: Tensor must have a stride divisible by 2 for all but last dimension

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

/usr/local/lib/python3.12/dist-packages/torch/autograd/graph.py:865: UserWarning: Error detected in ViewAsRealBackward0. Traceback of forward call that caused the error:
  File "/tmp/ipykernel_463/4071400647.py", line 21, in forward
    k_rot = torch.view_as_real(torch.view_as_complex(k.view(*k.shape[:-1], -1, 2).contiguous()) * rope)
 (Triggered internally at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:122.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

Expected Behavior

The compiled model should execute correctly and yield results consistent with eager mode. Inductor should respect the .contiguous() call which ensures the stride requirements for view_as_complex are met, even when fusing across the transpose and matmul operations.

Actual Behavior

The compiler fails during the fake_tensor propagation or kernel lowering phase with: RuntimeError: Tensor must have a stride divisible by 2 for all but last dimension

This error indicates that despite the explicit .contiguous() call, Inductor's internal memory planning for the fused kernel creates a layout where the real and imaginary components of the complex view are not physically adjacent in memory.

Versions

Reproduced environment:

PyTorch Version: 2.10.0
Backend: Inductor
Device: CPU, CUDA, MPS

cc @ezyang @anjali411 @dylanbespalko @mruberry @nikitaved @amjames @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @aakhundov @coconutruben @jataylo

extent analysis

Fix Plan

1. Update PyTorch to the latest version

Ensure you are running the latest version of PyTorch. The issue might be resolved in newer versions.

2. Disable Inductor backend for the specific operation

You can disable the Inductor backend for the specific operation that's causing the issue. This can be done by setting the torch.compile option backend to None or by using the torch.compile option exclude to exclude the specific operation.

compiled_model = torch.compile(model, backend=None)

compiled_model = torch.compile(model, exclude=['torch.view_as_complex'])

3. Use a different backend

If disabling the Inductor backend doesn't resolve the issue, try using a different backend, such as CUDA or CPU.

compiled_model = torch.compile(model, backend='cuda')

compiled_model = torch.compile(model, backend='cpu')

4. Modify the code to avoid the issue

If none of the above solutions work, you can try modifying the code to avoid the issue. For example, you can use a different method to create complex tensors.

q_rot = torch.randn(B, L, H, D // 2, 2, device=device)
k_rot = torch.randn(B, L, H, D // 2, 2, device=device)

Verification

To verify that the fix worked, run the reproduction script again and check if the error is resolved.

if __name__ == "__main__":
    run()

If the error is resolved, the script should run without any issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #serialization error #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix Inductor failure with `view_as_complex` in fused Multi-Head Attention subgraph [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Output Log

Code Example

🐛 Describe the bug

Summary

Minimal Reproduction Script

Output Log

Expected Behavior

Actual Behavior

Versions

extent analysis

Fix Plan

1. Update PyTorch to the latest version

2. Disable Inductor backend for the specific operation

3. Use a different backend

4. Modify the code to avoid the issue

Verification

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix Inductor failure with `view_as_complex` in fused Multi-Head Attention subgraph [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Output Log

Code Example

🐛 Describe the bug

Summary

Minimal Reproduction Script

Output Log

Expected Behavior

Actual Behavior

Versions

extent analysis

Fix Plan

1. Update PyTorch to the latest version

2. Disable Inductor backend for the specific operation

3. Use a different backend

4. Modify the code to avoid the issue

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING