pytorch - 💡(How to fix) Fix `torch.maximum` / `torch.clamp_min` gradients at tie point violate PyTorch's documented subgradient convention and are inconsistent with `F.relu`

pytorch2026-05-20 17:00:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Code Example

import torch
import torch.nn.functional as F

formulations = [
    ("F.relu(x)",                   lambda x: F.relu(x)),
    ("torch.maximum(x, tensor(0))", lambda x: torch.maximum(x, torch.tensor(0.0))),
    ("torch.clamp_min(x, 0)",       lambda x: torch.clamp_min(x, 0.0)),
    ("torch.clamp(x, min=0)",       lambda x: torch.clamp(x, min=0.0)),
]

for name, fn in formulations:
    x = torch.tensor(0.0, requires_grad=True)
    fn(x).backward()
    print(f"{name:<35} grad at x=0: {x.grad.item()}")

---

F.relu(x)                           grad at x=0: 0.0
torch.maximum(x, tensor(0))         grad at x=0: 0.5
torch.clamp_min(x, 0)               grad at x=0: 1.0
torch.clamp(x, min=0)               grad at x=0: 1.0

---

F.relu(x)                           grad at x=0: 0.0
torch.maximum(x, tensor(0))         grad at x=0: 0.0
torch.clamp_min(x, 0)               grad at x=0: 0.0
torch.clamp(x, min=0)               grad at x=0: 0.0

---

PyTorch version: 2.13.0.dev20260512+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: 18.1.3 (1ubuntu1)
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.14.0-37-generic-x86_64-with-glibc2.39
Is CUDA available: True
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5090
Nvidia driver version: 590.48.01

[pip3] numpy==2.4.4
[pip3] torch==2.13.0.dev20260512+cu130
[pip3] triton==3.7.0+git88b227e2

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.maximum(x, tensor(0)), torch.clamp_min(x, 0), and torch.clamp(x, min=0) all implement max(x, 0), but their gradients at x=0 contradict PyTorch's own autograd documentation and are mutually inconsistent.

Minimal reproducer

import torch
import torch.nn.functional as F

formulations = [
    ("F.relu(x)",                   lambda x: F.relu(x)),
    ("torch.maximum(x, tensor(0))", lambda x: torch.maximum(x, torch.tensor(0.0))),
    ("torch.clamp_min(x, 0)",       lambda x: torch.clamp_min(x, 0.0)),
    ("torch.clamp(x, min=0)",       lambda x: torch.clamp(x, min=0.0)),
]

for name, fn in formulations:
    x = torch.tensor(0.0, requires_grad=True)
    fn(x).backward()
    print(f"{name:<35} grad at x=0: {x.grad.item()}")

Observed output

F.relu(x)                           grad at x=0: 0.0
torch.maximum(x, tensor(0))         grad at x=0: 0.5
torch.clamp_min(x, 0)               grad at x=0: 1.0
torch.clamp(x, min=0)               grad at x=0: 1.0

Expected output

F.relu(x)                           grad at x=0: 0.0
torch.maximum(x, tensor(0))         grad at x=0: 0.0
torch.clamp_min(x, 0)               grad at x=0: 0.0
torch.clamp(x, min=0)               grad at x=0: 0.0

Why 0.0 is the correct answer

All four expressions compute the same mathematical function max(x, 0), yet they produce three different gradients at x=0. This is a problem in practice: users who refactor between these equivalent formulations silently get different gradient behaviour at zero-initialised weights or ReLU activations that land exactly on zero.

PyTorch's own autograd documentation specifies:

"If the function is convex (at least locally), use the sub-gradient of minimum norm."

max(x, 0) is convex. Its subdifferential at x=0 is the interval [0, 1]. The element of minimum norm is 0.0.

Formulation	grad at x=0	Consistent with docs?
`F.relu(x)`	0.0	✓
`torch.abs(x)`	0.0	✓
`torch.maximum(x, tensor(0))`	0.5	✗
`torch.clamp_min(x, 0)`	1.0	✗
`torch.clamp(x, min=0)`	1.0	✗

F.relu and torch.abs already follow the documented minimum-norm convention. torch.maximum and torch.clamp do not.

Versions

PyTorch version: 2.13.0.dev20260512+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: 18.1.3 (1ubuntu1)
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.14.0-37-generic-x86_64-with-glibc2.39
Is CUDA available: True
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5090
Nvidia driver version: 590.48.01

[pip3] numpy==2.4.4
[pip3] torch==2.13.0.dev20260512+cu130
[pip3] triton==3.7.0+git88b227e2

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering