pytorch - 💡(How to fix) Fix CTCLoss backward is inconsistent with its forward derivative with respect to the documented log_probs input. [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181534Fetched 2026-04-27 05:28:46
View on GitHub
Comments
0
Participants
1
Timeline
94
Reactions
0
Participants
Timeline (top)
mentioned ×44subscribed ×44labeled ×6

Error Message

import torch import torch.nn as nn import torch.nn.functional as F

torch.manual_seed(123)

T, N, C = 3, 1, 3 # time, batch, classes incl. blank log_probs = F.log_softmax(torch.randn(T, N, C), dim=-1).double() log_probs.requires_grad_(True)

targets = torch.tensor([[1]]) input_lens = torch.tensor([T]) target_lens = torch.tensor([1])

loss_fn = nn.CTCLoss(blank=0)

def f(x): return loss_fn(x, targets, input_lens, target_lens)

print(torch.autograd.gradcheck(f, (log_probs,), raise_exception=True))

Code Example

import torch
import torch.nn as nn
import torch.nn.functional as F

torch.manual_seed(123)

T, N, C = 3, 1, 3  # time, batch, classes incl. blank
log_probs = F.log_softmax(torch.randn(T, N, C), dim=-1).double()
log_probs.requires_grad_(True)

targets = torch.tensor([[1]])
input_lens = torch.tensor([T])
target_lens = torch.tensor([1])

loss_fn = nn.CTCLoss(blank=0)

def f(x):
    return loss_fn(x, targets, input_lens, target_lens)

print(torch.autograd.gradcheck(f, (log_probs,), raise_exception=True))
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import torch
import torch.nn as nn
import torch.nn.functional as F

torch.manual_seed(123)

T, N, C = 3, 1, 3  # time, batch, classes incl. blank
log_probs = F.log_softmax(torch.randn(T, N, C), dim=-1).double()
log_probs.requires_grad_(True)

targets = torch.tensor([[1]])
input_lens = torch.tensor([T])
target_lens = torch.tensor([1])

loss_fn = nn.CTCLoss(blank=0)

def f(x):
    return loss_fn(x, targets, input_lens, target_lens)

print(torch.autograd.gradcheck(f, (log_probs,), raise_exception=True))

Environment:

  • PyTorch: 2.11.0+cu128
  • Python: 3.13.5

The following minimal example reproduces the issue. gradcheck fails with a clear Jacobian mismatch for the input log_probs.

The numerical Jacobian contains exact zeros for class 2 at every time step:

numerical: tensor([[-0.4727], [-0.5273], [ 0.0000], [-0.5624], [-0.4376], [ 0.0000], [-0.4784], [-0.5216], [ 0.0000]], dtype=torch.float64)

However, the analytical gradient returned by CTCLoss backward gives non-zero values at the same positions:

analytical: tensor([[-0.1430], [-0.1116], [ 0.2547], [-0.2236], [-0.3075], [ 0.5311], [-0.3088], [-0.3109], [ 0.6198]], dtype=torch.float64)

Since class 2 is neither the blank class nor the target label in this example, the true derivative with respect to log_probs[..., 2] should be zero. The numerical Jacobian confirms this, but the analytical backward returns non-zero gradients.

This suggests that CTCLoss.backward() is inconsistent with the derivative of CTCLoss.forward() with respect to the documented log_probs input.

Versions

PyTorch: 2.11.0+cu128 Python: 3.13.5

cc @ezyang @albanD @gqchen @nikitaved @soulitzer @Varal7 @bobrenjc93 @mruberry @jbschlosser @walterddr @mikaylagawarecki

extent analysis

TL;DR

The most likely fix is to verify the correctness of the CTCLoss implementation, specifically the backward method, to ensure it aligns with the derivative of CTCLoss.forward() with respect to the log_probs input.

Guidance

  • Review the CTCLoss implementation, focusing on the backward method, to identify potential inconsistencies with the forward method's derivative.
  • Verify that the log_probs input is correctly handled in the backward method, particularly for classes that are neither the blank class nor the target label.
  • Check the PyTorch documentation and issue tracker for any known issues or updates related to CTCLoss and its gradient calculation.
  • Consider testing the CTCLoss implementation with different inputs and scenarios to further isolate the issue.

Notes

The provided example suggests an inconsistency in the CTCLoss.backward() method, but without access to the PyTorch implementation details, it's challenging to provide a definitive fix. The issue may be related to a specific version of PyTorch or a particular use case.

Recommendation

Apply workaround: Until the CTCLoss implementation is verified or updated, consider using a custom or alternative loss function that correctly handles the log_probs input and its gradient calculation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix CTCLoss backward is inconsistent with its forward derivative with respect to the documented log_probs input. [1 participants]