pytorch - ✅(Solved) Fix [Tracking] RISC-V PyTorch enablement [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180975Fetched 2026-04-22 07:43:11
View on GitHub
Comments
0
Participants
1
Timeline
53
Reactions
3
Author
Participants
Timeline (top)
subscribed ×27mentioned ×14labeled ×10added_to_project_v2 ×1

Fix Action

Fix / Workaround

  • Fix RISC-V architecture-specific bugs and upstream patches

  • Implement dispatch mechanism (select optimal kernel at runtime)

PR fix notes

PR #173663: [RISCV] disable cuda-bingings on riscv64 CI

Description (problem / solution / changelog)

cuda-bindings is currently gated on riscv64 because there is no supported CUDA toolchain or installable Python package available for this architecture.

Attempting to resolve cuda-bindings on riscv64 results in install-time failures, even when CUDA is explicitly disabled at build time.

This change disables cuda-bindings on riscv64 for now, and it can be re-enabled once official CUDA support or a working packaging path becomes available.

Changed files

  • .ci/docker/requirements-ci.txt (modified, +1/-1)

PR #178778: [CI] Restrict mkl installation to x86 systems only

Description (problem / solution / changelog)

For mkl 2024.02, there is no riscv64 wheel support, so disable it at this stage. Otherwise it will block CI on riscv64.

Changed files

  • .ci/docker/requirements-ci.txt (modified, +2/-2)
RAW_BUFFERClick to expand / collapse

Background

Following the prior RFC (https://github.com/pytorch/pytorch/issues/171659), we have concluded multiple rounds of discussions and initial experiments.

This effort has been jointly driven by XuanTie Team(from Alibaba Damo) and Ruyi Community(from Institute of Software, Chinese Academy of Sciences), with active participation and valuable inputs from multiple community partners @CodersAcademy006 @yuzibo @zhanghb97 @malfet @ezyang . Through these collective efforts, we have now arrived at a set of concrete and actionable goals for enabling PyTorch on the RISC-V ecosystem.

This issue serves as a central tracking item to monitor the concrete implementation tasks towards enabling full RISC-V support in PyTorch.

Tracking

Phase 1: CI Infrastructure and Validation

Goal: Establish a robust CI pipeline for RISC-V builds and testing

1.1 Build Environment & Wheel Construction

1.2 CI Pipeline Integration

1.3 Test Suite Validation & Bug Fixing

  • Build and maintain RISC-V test blocklist - https://github.com/RuyiAI-Stack/pytorch/pull/1

  • Fix RISC-V architecture-specific bugs and upstream patches 

  • Generate and maintain test_times.json for RISC-V CI sharding

Phase 2: High-Performance Micro-Kernel Library (uKernel)

Goal: Deliver optimized RISC-V implementations for core ATen operators, analogous to KleidiAI for ARM

2.1 Library Infrastructure

  • Add uKernel library dependency and build system integration into PyTorch

  • Implement runtime CPU feature detection for RISC-V (VLEN/RLEN)

  • Implement dispatch mechanism (select optimal kernel at runtime)

2.2 Core Compute Kernels (GEMM / Matmul)

  • Implement FP32 GEMM kernel with RVV

  • Implement FP16 GEMM kernel with RVV

  • Implement BF16 GEMM kernel with RVV

  • Implement FP8 GEMM kernel with RVV

  • Implement INT8 GEMM kernel with RVV (quantized)

  • Implement INT4 GEMM kernel with RVV (quantized)

  • Implement RVFP4 GEMM kernel with RVV

  • Implement GEMM with RVM (Matrix Extension)

  • Implement batched GEMM

  • Implement GEMV (matrix-vector) optimized kernels

2.3 Convolution Kernels

  • Implement im2col + GEMM based Conv2d with RVV/RVM

  • Implement direct Conv2d kernel with RVV/RVM

  • Implement depthwise Conv2d with RVV/RVM

  • Implement Conv1d with RVV/RVM

2.4 Element-wise & Activation Kernels

  • Implement vectorized ReLU / ReLU6 / LeakyReLU

  • Implement vectorized GELU / SiLU / Swish

  • Implement vectorized Sigmoid / Tanh

  • Implement vectorized element-wise Add / Mul / Sub / Div

  • Implement vectorized Softmax

  • Implement vectorized type cast kernels

2.5 Normalization & Reduction Kernels

  • Implement LayerNorm

  • Implement RMSNorm

  • Implement BatchNorm

  • Implement vectorized reduction ops (sum, mean, max, min)

2.6 Attention & Transformer-Specific Kernels

  • Implement Scaled Dot-Product Attention (SDPA)

  • Implement RoPE (Rotary Position Embedding)

2.7 Pooling & Data Rearrangement

  • Implement MaxPool2d / AvgPool2d

  • Implement AdaptiveAvgPool2d

  • Implement optimized Transpose / Permute

  • Implement optimized Embedding lookup

Phase 3: torch.compile Backend Extension

Goal: Enable high-performance graph compilation for RISC-V platform

3.1 RISC-V Ratified Extensions: Native torch.compile Support

  • Evaluate and validate existing torch.compile backends on RVV hardware. 

  • Extend PyTorch CPU vector ISA abstractions to incorporate RVV semantics.

  • Add optimization support for the variable-length vector semantics of RVV.

  • Incrementally expand operator and model coverage with RVV-targeted optimizations.

3.2 RISC-V Draft / Custom Extensions: torch.compile Lowering via RuyiAI Buddy Compiler

  • Enable buddy-mlir as a torch.compile backend for draft and custom RISC-V extensions.

  • Support MLIR-based compilation for the RISC-V Vector extension.

  • Support MLIR-based compilation for RISC-V Matrix extensions, including AME, IME, and VME.

  • Support MLIR-based compilation for custom RISC-V extensions, such as UCB Gemmini accelerator.

3.3 Validation and Benchmarking

  • Verify compilation and execution correctness of torch.compile() across representative operators, models, and workloads on RISC-V.

  • Benchmark compiled execution against eager execution on RISC-V platforms, including end-to-end latency, throughput, compile time, and memory usage.

  • Profile and compare performance across torch.compile backend paths and RISC-V targets.

Phase 4: Triton / TileLang RISC-V Support and PyTorch Integration

Goal: Enable Triton / TileLang for RISC-V accelerators and establish a well-defined PyTorch integration path.

  • Develop Triton-RISCV and TileLang-RISCV

  • Define the Triton / TileLang integration interface for RISC-V (inductor template backend or external kernel provider).

  • Clarify the respective roles of Triton and TileLang for the PyTorch RISC-V backend.

  • Specify code generation output formats for RISC-V backends (binaries or IRs).

  • Develop autotuning mechanisms for PyTorch integration over Triton / TileLang-based RISC-V backends.

  • Expose language-level profiling and debugging capabilities through standard PyTorch interfaces.

  • Validate correctness and evaluate performance at both operator and model levels.

Call for Participation

If you have any suggestions on the tracking items, or would like to get involved in the development of specific features, you are very welcome to join and co-build with us!

cc @malfet @seemethere @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @pytorch/pytorch-dev-infra

extent analysis

TL;DR

To enable PyTorch on the RISC-V ecosystem, focus on establishing a robust CI pipeline, implementing optimized RISC-V operators, and extending torch.compile backend support.

Guidance

  1. Establish CI Infrastructure: Set up RISC-V cross-compilation and native compilation environments, and integrate Jenkins CI and GitHub Actions CI builds for RISC-V PyTorch.
  2. Implement Optimized Operators: Focus on implementing optimized RISC-V implementations for core ATen operators, including GEMM, convolution, and element-wise operations.
  3. Extend torch.compile Backend: Evaluate and validate existing torch.compile backends on RVV hardware, and extend PyTorch CPU vector ISA abstractions to incorporate RVV semantics.
  4. Validate and Benchmark: Verify compilation and execution correctness of torch.compile() across representative operators, models, and workloads on RISC-V, and benchmark compiled execution against eager execution.

Example

No specific code snippet is provided, as the issue focuses on high-level implementation tasks and goals.

Notes

The provided issue lacks specific technical details and code snippets, making it challenging to provide a detailed solution. However, the guidance points above should help in establishing a robust CI pipeline, implementing optimized operators, and extending torch.compile backend support.

Recommendation

Apply the workaround by focusing on the implementation tasks outlined in the issue, particularly establishing a robust CI pipeline and implementing optimized RISC-V operators. This approach will help enable PyTorch on the RISC-V ecosystem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix [Tracking] RISC-V PyTorch enablement [2 pull requests, 1 participants]