Error Message

import torch

Finding A: 4D input + pad_size=2 (pad last dim)

x4 = torch.randn(10, 5, 7, 2) print(f"[A] 4D + pad=(0,1) + mode='circular'") try: out = torch.nn.functional.pad(x4, (0, 1), mode='circular') print(f" ok: {tuple(out.shape)}") except Exception as e: print(f" {type(e).name}: {e}")

Finding B: 5D input + pad_size=4

x5 = torch.randn(6, 2, 9, 10, 3) print(f"[B] 5D + pad=(0,4,2,4) + mode='replicate'") try: out = torch.nn.functional.pad(x5, (0, 4, 2, 4), mode='replicate') print(f" ok: {tuple(out.shape)}") except Exception as e: print(f" {type(e).name}: {e}")

Reference: 3D + pad_size=2 works (same pad_size, one rank lower)

x3 = torch.randn(5, 7, 4) print(f"[ref] 3D + pad=(1,1) + mode='circular'") out = torch.nn.functional.pad(x3, (1, 1), mode='circular') print(f" ok: {tuple(out.shape)}")

Fix Action

Fix / Workaround

torch.nn.functional.pad with mode='replicate', 'reflect', or 'circular' fails with NotImplementedError for certain (ndim, pad_size) combinations that are not documented as unsupported. The error only appears at runtime via a kernel-dispatch table, not at argument validation. Users padding the last dimension of a 4D or 5D tensor — a natural and common operation — encounter an opaque kernel error with no guidance in the API docstring.

None of this is in the torch.nn.functional.pad docstring. A user who wants to pad the last dimension of a 4D tensor writes pad=(0,1) (pad_size=2) — the same pattern that works for 3D — and gets a kernel-level NotImplementedError with no prior hint. The constraint exists only in the dispatch table. Moving this check to argument validation with a clear ValueError — and documenting the valid combinations in the docstring — would prevent these silent failures.

Code Example

import torch

# Finding A: 4D input + pad_size=2 (pad last dim)
x4 = torch.randn(10, 5, 7, 2)
print(f"[A] 4D + pad=(0,1) + mode='circular'")
try:
    out = torch.nn.functional.pad(x4, (0, 1), mode='circular')
    print(f"    ok: {tuple(out.shape)}")
except Exception as e:
    print(f"    {type(e).__name__}: {e}")

# Finding B: 5D input + pad_size=4
x5 = torch.randn(6, 2, 9, 10, 3)
print(f"[B] 5D + pad=(0,4,2,4) + mode='replicate'")
try:
    out = torch.nn.functional.pad(x5, (0, 4, 2, 4), mode='replicate')
    print(f"    ok: {tuple(out.shape)}")
except Exception as e:
    print(f"    {type(e).__name__}: {e}")

# Reference: 3D + pad_size=2 works (same pad_size, one rank lower)
x3 = torch.randn(5, 7, 4)
print(f"[ref] 3D + pad=(1,1) + mode='circular'")
out = torch.nn.functional.pad(x3, (1, 1), mode='circular')
print(f"    ok: {tuple(out.shape)}")

---

[A] 4D + pad=(0,1) + mode='circular'
    NotImplementedError: Padding size 2 is not supported for 4D input tensor.
    Supported combinations for non-constant padding:
      - 2D or 3D input: padding size = 2 (pads last dimension)
      - 3D or 4D input: padding size = 4 (pads last 2 dimensions)
      - 4D or 5D input: padding size = 6 (pads last 3 dimensions)
[B] 5D + pad=(0,4,2,4) + mode='replicate'
    NotImplementedError: Padding size 4 is not supported for 5D input tensor.
    ...
[ref] 3D + pad=(1,1) + mode='circular'
    ok: (5, 7, 6)

---

[A] 4D + pad=(0,1) + mode='circular'
    ValueError: mode='circular' with 4D input requires pad_size=4 (last 2 dims)
                or pad_size=6 (last 3 dims). pad_size=2 is only valid for 2D or 3D inputs.
[B] 5D + pad=(0,4,2,4) + mode='replicate'
    ValueError: mode='replicate' with 5D input requires pad_size=6. ...
[ref] 3D + pad=(1,1) + mode='circular'
    ok: (5, 7, 6)

---

PyTorch version: 2.13.0.dev20260512+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: 18.1.3 (1ubuntu1)
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.14.0-37-generic-x86_64-with-glibc2.39
Is CUDA available: True
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5090
Nvidia driver version: 590.48.01

[pip3] numpy==2.4.4
[pip3] torch==2.13.0.dev20260512+cu130
[pip3] triton==3.7.0+git88b227e2

🐛 Describe the bug

Minimal reproducer

import torch

# Finding A: 4D input + pad_size=2 (pad last dim)
x4 = torch.randn(10, 5, 7, 2)
print(f"[A] 4D + pad=(0,1) + mode='circular'")
try:
    out = torch.nn.functional.pad(x4, (0, 1), mode='circular')
    print(f"    ok: {tuple(out.shape)}")
except Exception as e:
    print(f"    {type(e).__name__}: {e}")

# Finding B: 5D input + pad_size=4
x5 = torch.randn(6, 2, 9, 10, 3)
print(f"[B] 5D + pad=(0,4,2,4) + mode='replicate'")
try:
    out = torch.nn.functional.pad(x5, (0, 4, 2, 4), mode='replicate')
    print(f"    ok: {tuple(out.shape)}")
except Exception as e:
    print(f"    {type(e).__name__}: {e}")

# Reference: 3D + pad_size=2 works (same pad_size, one rank lower)
x3 = torch.randn(5, 7, 4)
print(f"[ref] 3D + pad=(1,1) + mode='circular'")
out = torch.nn.functional.pad(x3, (1, 1), mode='circular')
print(f"    ok: {tuple(out.shape)}")

Observed output

[A] 4D + pad=(0,1) + mode='circular'
    NotImplementedError: Padding size 2 is not supported for 4D input tensor.
    Supported combinations for non-constant padding:
      - 2D or 3D input: padding size = 2 (pads last dimension)
      - 3D or 4D input: padding size = 4 (pads last 2 dimensions)
      - 4D or 5D input: padding size = 6 (pads last 3 dimensions)
[B] 5D + pad=(0,4,2,4) + mode='replicate'
    NotImplementedError: Padding size 4 is not supported for 5D input tensor.
    ...
[ref] 3D + pad=(1,1) + mode='circular'
    ok: (5, 7, 6)

Expected output

[A] 4D + pad=(0,1) + mode='circular'
    ValueError: mode='circular' with 4D input requires pad_size=4 (last 2 dims)
                or pad_size=6 (last 3 dims). pad_size=2 is only valid for 2D or 3D inputs.
[B] 5D + pad=(0,4,2,4) + mode='replicate'
    ValueError: mode='replicate' with 5D input requires pad_size=6. ...
[ref] 3D + pad=(1,1) + mode='circular'
    ok: (5, 7, 6)

Why this is a bug

The valid (ndim, pad_size) matrix for non-constant modes is:

pad_size →	2	4	6
ndim=2	✓	✗	✗
ndim=3	✓	✓	✗
ndim=4	✗	✓	✓
ndim=5	✗	✗	✓

Crash statistics: Finding A: 244 crashes (122 CPU + 122 CUDA); Finding B: 1388 crashes (694 CPU + 694 CUDA). Specialized generator, 500–2000 inputs each.

Versions

PyTorch version: 2.13.0.dev20260512+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: 18.1.3 (1ubuntu1)
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.14.0-37-generic-x86_64-with-glibc2.39
Is CUDA available: True
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5090
Nvidia driver version: 590.48.01

[pip3] numpy==2.4.4
[pip3] torch==2.13.0.dev20260512+cu130
[pip3] triton==3.7.0+git88b227e2

cc @svekars @sekyondaMeta @AlannaBurke @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @malfet

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix `torch.nn.functional.pad` raises `NotImplementedError` for valid `(ndim, pad_size)` combinations in non-constant modes — constraint not documented in the API

Recommended Tools

GitHub issue graph ai analysis