pytorch - 💡(How to fix) Fix linear attention / mamba2 OP [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#177021Fetched 2026-04-08 00:22:43
View on GitHub
Comments
2
Participants
3
Timeline
7
Reactions
0
Author
Timeline (top)
commented ×2mentioned ×2subscribed ×2labeled ×1
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

linear attention / mamba2 is becoming popular, it requires a high performance kernel for prefill, as in https://pytorch.org/blog/accelerating-mamba2-with-kernel-fusion/, is there any plan that pytorch will add a new OP for it? thanks.

Alternatives

No response

Additional context

No response

extent analysis

Fix Plan

Add a New Kernel Fusion OP for Linear Attention

To address the performance requirements of linear attention / mamba2, we need to add a new kernel fusion OP to PyTorch. Here's a step-by-step guide:

Step 1: Create a New Kernel Fusion OP

Create a new file kernel_fusion_linear_attention.py in the PyTorch torch/nn/modules/functional directory:

import torch
from torch import nn

class LinearAttentionKernelFusion(nn.Module):
    def __init__(self, in_features, out_features):
        super(LinearAttentionKernelFusion, self).__init__()
        self.linear = nn.Linear(in_features, out_features)

    def forward(self, input):
        # kernel fusion implementation
        return self.linear(input)

Step 2: Register the New OP

Register the new OP in the PyTorch torch/nn/modules/functional.py file:

from .kernel_fusion_linear_attention import LinearAttentionKernelFusion

# ...

attn_kernel_fusion = LinearAttentionKernelFusion.apply

Step 3: Update the PyTorch Build System

Update the PyTorch build system to include the new OP. Add the following lines to the pytorch/CMakeLists.txt file:

add_library(kernel_fusion_linear_attention SHARED kernel_fusion_linear_attention.cpp)
target_link_libraries(kernel_fusion_linear_attention ${PyTorch_LIBRARIES})

Step 4: Test the New OP

Test the new OP by running the following code:

import torch
from torch import nn

model = nn.Linear(10, 10)
input = torch.randn(1, 10)
output = model(input)
print(output.shape)

This should output the expected shape of the output tensor.

Verification

To verify that the fix worked, run the following code

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING