pytorch - 💡(How to fix) Fix `torch.clamp` / `torch.clamp_min` / `torch.clamp_max` return gradient 1.0 at boundary points where `F.relu6` and `F.hardtanh` return 0.0 for the same mathematical function

Code Example

import torch
import torch.nn.functional as F

# F.relu6(x) = torch.clamp(x, 0, 6)  — identical functions
for x_val, label in [(0.0, "lower boundary"), (6.0, "upper boundary")]:
    x1 = torch.tensor(x_val, requires_grad=True)
    F.relu6(x1).backward()

    x2 = torch.tensor(x_val, requires_grad=True)
    torch.clamp(x2, 0.0, 6.0).backward()

    print(f"x={x_val} ({label}):  F.relu6={x1.grad.item()}  torch.clamp={x2.grad.item()}")

print()

# F.hardtanh(x, -1, 1) = torch.clamp(x, -1, 1)  — identical functions
for x_val, label in [(-1.0, "lower boundary"), (1.0, "upper boundary")]:
    x1 = torch.tensor(x_val, requires_grad=True)
    F.hardtanh(x1, -1.0, 1.0).backward()

    x2 = torch.tensor(x_val, requires_grad=True)
    torch.clamp(x2, -1.0, 1.0).backward()

    print(f"x={x_val} ({label}):  F.hardtanh={x1.grad.item()}  torch.clamp={x2.grad.item()}")

---

x=0.0 (lower boundary):  F.relu6=0.0  torch.clamp=1.0
x=6.0 (upper boundary):  F.relu6=0.0  torch.clamp=1.0

x=-1.0 (lower boundary):  F.hardtanh=0.0  torch.clamp=1.0
x=1.0  (upper boundary):  F.hardtanh=0.0  torch.clamp=1.0

---

x=0.0 (lower boundary):  F.relu6=0.0  torch.clamp=0.0
x=6.0 (upper boundary):  F.relu6=0.0  torch.clamp=0.0

x=-1.0 (lower boundary):  F.hardtanh=0.0  torch.clamp=0.0
x=1.0  (upper boundary):  F.hardtanh=0.0  torch.clamp=0.0

---

import torch
import torch.nn.functional as F

ops = [
    ("F.relu(x)",                  lambda x: F.relu(x)),
    ("F.relu6(x) at x=0",         lambda x: F.relu6(x)),
    ("F.relu6(x) at x=6",         lambda x: F.relu6(x)),
    ("F.hardtanh(x,-1,1) at x=1", lambda x: F.hardtanh(x, -1.0, 1.0)),
    ("torch.clamp(x,0,6) at x=0", lambda x: torch.clamp(x, 0.0, 6.0)),
    ("torch.clamp(x,0,6) at x=6", lambda x: torch.clamp(x, 0.0, 6.0)),
    ("torch.clamp_min(x,0)",       lambda x: torch.clamp_min(x, 0.0)),
    ("torch.clamp_max(x,6)",       lambda x: torch.clamp_max(x, 6.0)),
]
xvals = [0.0, 0.0, 6.0, 1.0, 0.0, 6.0, 0.0, 6.0]

for (name, fn), xv in zip(ops, xvals):
    x = torch.tensor(xv, requires_grad=True)
    fn(x).backward()
    print(f"{name:<35} grad={x.grad.item()}")

---

F.relu(x)                           grad=0.0   ✓
F.relu6(x) at x=0                   grad=0.0   ✓
F.relu6(x) at x=6                   grad=0.0   ✓
F.hardtanh(x,-1,1) at x=1           grad=0.0   ✓
torch.clamp(x,0,6) at x=0           grad=1.0   ✗
torch.clamp(x,0,6) at x=6           grad=1.0   ✗
torch.clamp_min(x,0)                 grad=1.0   ✗
torch.clamp_max(x,6)                 grad=1.0   ✗

---

PyTorch version: 2.13.0.dev20260512+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: 18.1.3 (1ubuntu1)
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.14.0-37-generic-x86_64-with-glibc2.39
Is CUDA available: True
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5090
Nvidia driver version: 590.48.01

[pip3] numpy==2.4.4
[pip3] torch==2.13.0.dev20260512+cu130
[pip3] triton==3.7.0+git88b227e2

🐛 Describe the bug

torch.clamp, torch.clamp_min, and torch.clamp_max return gradient 1.0 at their boundary points. F.relu6 and F.hardtanh are the same mathematical function as torch.clamp but return gradient 0.0 at their boundaries. Two implementations of the same function disagree on the gradient.

Minimal reproducer

import torch
import torch.nn.functional as F

# F.relu6(x) = torch.clamp(x, 0, 6)  — identical functions
for x_val, label in [(0.0, "lower boundary"), (6.0, "upper boundary")]:
    x1 = torch.tensor(x_val, requires_grad=True)
    F.relu6(x1).backward()

    x2 = torch.tensor(x_val, requires_grad=True)
    torch.clamp(x2, 0.0, 6.0).backward()

    print(f"x={x_val} ({label}):  F.relu6={x1.grad.item()}  torch.clamp={x2.grad.item()}")

print()

# F.hardtanh(x, -1, 1) = torch.clamp(x, -1, 1)  — identical functions
for x_val, label in [(-1.0, "lower boundary"), (1.0, "upper boundary")]:
    x1 = torch.tensor(x_val, requires_grad=True)
    F.hardtanh(x1, -1.0, 1.0).backward()

    x2 = torch.tensor(x_val, requires_grad=True)
    torch.clamp(x2, -1.0, 1.0).backward()

    print(f"x={x_val} ({label}):  F.hardtanh={x1.grad.item()}  torch.clamp={x2.grad.item()}")

Observed output

x=0.0 (lower boundary):  F.relu6=0.0  torch.clamp=1.0
x=6.0 (upper boundary):  F.relu6=0.0  torch.clamp=1.0

x=-1.0 (lower boundary):  F.hardtanh=0.0  torch.clamp=1.0
x=1.0  (upper boundary):  F.hardtanh=0.0  torch.clamp=1.0

Expected output

x=0.0 (lower boundary):  F.relu6=0.0  torch.clamp=0.0
x=6.0 (upper boundary):  F.relu6=0.0  torch.clamp=0.0

x=-1.0 (lower boundary):  F.hardtanh=0.0  torch.clamp=0.0
x=1.0  (upper boundary):  F.hardtanh=0.0  torch.clamp=0.0

Why 0.0 is the correct answer

1. PyTorch's own autograd documentation states:

"If the function is convex (at least locally), use the sub-gradient of minimum norm."

clamp(x, lo, hi) is convex. At a boundary point (e.g. x = lo), the subdifferential is the interval [0, 1]. The element of minimum norm is 0.0.

2. Internal consistency: F.relu6 and F.hardtanh already do the right thing.

F.relu6(x) = torch.clamp(x, 0, 6) and F.hardtanh(x, a, b) = torch.clamp(x, a, b) — these are not approximations, they are the same operation. Both return 0.0 at boundaries, consistent with the documented convention. torch.clamp returning 1.0 is therefore a bug in torch.clamp's backward, not a design choice.

3. This is consistent with the existing F.relu / torch.maximum convention.

F.relu(0.0) gradient = 0.0, also following the min-norm rule. torch.clamp_min(x, 0) and torch.clamp(x, min=0) are equivalent to F.relu but return 1.0 at x=0 — another inconsistency in the same family.

Full picture of affected ops

import torch
import torch.nn.functional as F

ops = [
    ("F.relu(x)",                  lambda x: F.relu(x)),
    ("F.relu6(x) at x=0",         lambda x: F.relu6(x)),
    ("F.relu6(x) at x=6",         lambda x: F.relu6(x)),
    ("F.hardtanh(x,-1,1) at x=1", lambda x: F.hardtanh(x, -1.0, 1.0)),
    ("torch.clamp(x,0,6) at x=0", lambda x: torch.clamp(x, 0.0, 6.0)),
    ("torch.clamp(x,0,6) at x=6", lambda x: torch.clamp(x, 0.0, 6.0)),
    ("torch.clamp_min(x,0)",       lambda x: torch.clamp_min(x, 0.0)),
    ("torch.clamp_max(x,6)",       lambda x: torch.clamp_max(x, 6.0)),
]
xvals = [0.0, 0.0, 6.0, 1.0, 0.0, 6.0, 0.0, 6.0]

for (name, fn), xv in zip(ops, xvals):
    x = torch.tensor(xv, requires_grad=True)
    fn(x).backward()
    print(f"{name:<35} grad={x.grad.item()}")

F.relu(x)                           grad=0.0   ✓
F.relu6(x) at x=0                   grad=0.0   ✓
F.relu6(x) at x=6                   grad=0.0   ✓
F.hardtanh(x,-1,1) at x=1           grad=0.0   ✓
torch.clamp(x,0,6) at x=0           grad=1.0   ✗
torch.clamp(x,0,6) at x=6           grad=1.0   ✗
torch.clamp_min(x,0)                 grad=1.0   ✗
torch.clamp_max(x,6)                 grad=1.0   ✗

Versions

PyTorch version: 2.13.0.dev20260512+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: 18.1.3 (1ubuntu1)
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.14.0-37-generic-x86_64-with-glibc2.39
Is CUDA available: True
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5090
Nvidia driver version: 590.48.01

[pip3] numpy==2.4.4
[pip3] torch==2.13.0.dev20260512+cu130
[pip3] triton==3.7.0+git88b227e2

cc @ezyang @albanD @gqchen @nikitaved @soulitzer @Varal7 @bobrenjc93

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix `torch.clamp` / `torch.clamp_min` / `torch.clamp_max` return gradient 1.0 at boundary points where `F.relu6` and `F.hardtanh` return 0.0 for the same mathematical function

Recommended Tools

GitHub issue graph ai analysis