pytorch - ✅(Solved) Fix `torch.compile` silently accepts `nn.Embedding` with float tensor indices where eager raises TypeError [1 pull requests, 3 comments, 3 participants]

pytorch2026-03-21 07:58:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178042•Fetched 2026-04-08 01:07:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

mentioned ×21subscribed ×21labeled ×13unlabeled ×7

Error Message

import torch import torch.nn as nn

class ModelWithEmbedding(nn.Module): def init(self): super().init() self.embedding = nn.Embedding(100, 32) self.fc = nn.Linear(32, 10)

def forward(self, x):
    emb = self.embedding(x)  # expects LongTensor or IntTensor
    return self.fc(emb)

model = ModelWithEmbedding().cuda()

float32 tensor — wrong dtype for Embedding

x = torch.randn(2, 8, device="cuda", dtype=torch.float32)

Eager: TypeError

try: model(x) print("eager: OK") except (TypeError, RuntimeError) as e: print(f"eager: ERROR — {e}")

Compiled: silently succeeds

torch._dynamo.reset() compiled_model = torch.compile(model, fullgraph=True) try: out = compiled_model(x) print(f"compile: OK shape={out.shape}") except Exception as e: print(f"compile: ERROR — {e}")

PR fix notes

PR #179754: [pt2] Add indices dtype check to embedding meta registration

Repository: pytorch/pytorch
Author: XAheli
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/179754

Description (problem / solution / changelog)

Fixes https://github.com/pytorch/pytorch/issues/178042

The aten.embedding meta function was missing the indices dtype check that exists in C++ (checkScalarTypes in Embedding.cpp). During compile, FakeTensor tracing passes the invalid op through without error, and then AOTAutograd's DCE removes the dead node — so the C++ check is never reached.

Added torch._check for indices dtype in the meta function so the error fires during tracing, before DCE runs.

Test: test_embedding_float_indices_error in test/nn/test_embedding.py — covers eager, aot_eager, inductor

Co-authored-with: Claude

cc @bdhirsh @penguinwu @bobrenjc93 @aorenste

Changed files

test/nn/test_embedding.py (modified, +32/-0)
torch/_meta_registrations.py (modified, +7/-0)

Code Example

import torch
import torch.nn as nn

class ModelWithEmbedding(nn.Module):
    def __init__(self):
        super().__init__()
        self.embedding = nn.Embedding(100, 32)
        self.fc = nn.Linear(32, 10)

    def forward(self, x):
        emb = self.embedding(x)  # expects LongTensor or IntTensor
        return self.fc(emb)

model = ModelWithEmbedding().cuda()
# float32 tensor — wrong dtype for Embedding
x = torch.randn(2, 8, device="cuda", dtype=torch.float32)

# Eager: TypeError
try:
    model(x)
    print("eager: OK")
except (TypeError, RuntimeError) as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

---

import torch
import torch.nn as nn

class ModelWithGELUConv(nn.Module):
    def __init__(self, in_channels=3, out_channels=16, embedding_vocab=100):
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0)
        self.embedding = nn.Embedding(num_embeddings=embedding_vocab, embedding_dim=32)
        self.fc = nn.Linear(out_channels * 32 * 32, 10)
        self.embedding_vocab = embedding_vocab

    def forward(self, x, token_ids=None):
        conv_out = self.conv(x)
        # Manual GELU implementation targeting unary_3 pattern
        gelu_out = 0.5 * conv_out * (1 + torch.erf(conv_out * 0.7071067811865476))

        if token_ids is not None:
            embedded = self.embedding(token_ids)  # token_ids should be Long, but is Float!
            batch_size = gelu_out.size(0)
            gelu_flat = gelu_out.view(batch_size, -1)
            output = self.fc(gelu_flat)
            return output
        else:
            batch_size = gelu_out.size(0)
            gelu_flat = gelu_out.view(batch_size, -1)
            output = self.fc(gelu_flat)
            return output

model = ModelWithGELUConv().cuda()
x_image = torch.randn(2, 3, 32, 32, dtype=torch.float32, device="cuda")
test_tensor = torch.randn(1, 16, 32, 32, dtype=torch.float32, device="cuda")  # <-- wrong dtype!

# Eager: TypeError — expected Long or Int indices
try:
    model(x_image, test_tensor)
    print("eager: OK")
except (TypeError, RuntimeError) as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x_image, test_tensor)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

---

TypeError: Expected tensor for argument #1 'indices' to have one of the following
scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking
arguments for embedding)

---

(no error — silently returns tensor)

---

PyTorch version: 2.12.0.dev20260315+cu126
OS: Ubuntu 22.04.5 LTS (x86_64)
Python version: 3.10.12
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile silently succeeds when an nn.Embedding layer receives a float32 tensor as indices, while eager mode correctly raises TypeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got ... FloatTensor instead.

This was discovered via a fuzzer-generated CNN+Embedding model (targeting the unary_3 / GELU conv pattern). The model instantiates both a Conv2d and an Embedding layer, and the test inputs mistakenly pass a float32 tensor to the embedding layer. Eager mode correctly rejects this type mismatch, but torch.compile silently processes the float tensor, presumably by implicit casting or by bypassing the type check in the compiled graph.

Minimal reproducer

import torch
import torch.nn as nn

class ModelWithEmbedding(nn.Module):
    def __init__(self):
        super().__init__()
        self.embedding = nn.Embedding(100, 32)
        self.fc = nn.Linear(32, 10)

    def forward(self, x):
        emb = self.embedding(x)  # expects LongTensor or IntTensor
        return self.fc(emb)

model = ModelWithEmbedding().cuda()
# float32 tensor — wrong dtype for Embedding
x = torch.randn(2, 8, device="cuda", dtype=torch.float32)

# Eager: TypeError
try:
    model(x)
    print("eager: OK")
except (TypeError, RuntimeError) as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

Full model-level reproducer (as found by fuzzer)

import torch
import torch.nn as nn

class ModelWithGELUConv(nn.Module):
    def __init__(self, in_channels=3, out_channels=16, embedding_vocab=100):
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0)
        self.embedding = nn.Embedding(num_embeddings=embedding_vocab, embedding_dim=32)
        self.fc = nn.Linear(out_channels * 32 * 32, 10)
        self.embedding_vocab = embedding_vocab

    def forward(self, x, token_ids=None):
        conv_out = self.conv(x)
        # Manual GELU implementation targeting unary_3 pattern
        gelu_out = 0.5 * conv_out * (1 + torch.erf(conv_out * 0.7071067811865476))

        if token_ids is not None:
            embedded = self.embedding(token_ids)  # token_ids should be Long, but is Float!
            batch_size = gelu_out.size(0)
            gelu_flat = gelu_out.view(batch_size, -1)
            output = self.fc(gelu_flat)
            return output
        else:
            batch_size = gelu_out.size(0)
            gelu_flat = gelu_out.view(batch_size, -1)
            output = self.fc(gelu_flat)
            return output

model = ModelWithGELUConv().cuda()
x_image = torch.randn(2, 3, 32, 32, dtype=torch.float32, device="cuda")
test_tensor = torch.randn(1, 16, 32, 32, dtype=torch.float32, device="cuda")  # <-- wrong dtype!

# Eager: TypeError — expected Long or Int indices
try:
    model(x_image, test_tensor)
    print("eager: OK")
except (TypeError, RuntimeError) as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x_image, test_tensor)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

Behavior summary

Operation	Eager	`torch.compile`	Consistent?
`nn.Embedding(float32_tensor)`	`TypeError`	Succeeds	❌ No

Ablation

This bug was discovered in E0 (baseline) trial-1, indicating it appears even without advanced prompt engineering.

Error logs

Eager mode (correct behavior):

TypeError: Expected tensor for argument #1 'indices' to have one of the following
scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking
arguments for embedding)

torch.compile (incorrect — should raise the same error):

(no error — silently returns tensor)

Versions

PyTorch version: 2.12.0.dev20260315+cu126
OS: Ubuntu 22.04.5 LTS (x86_64)
Python version: 3.10.12
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo @bdhirsh @bobrenjc93 @aorenste @malfet

extent analysis

Fix Plan

To fix the issue, we need to ensure that the input to the nn.Embedding layer is of the correct type, i.e., LongTensor or IntTensor. We can achieve this by explicitly casting the input tensor to the correct type.

Here are the steps to fix the issue:

Check the type of the input tensor before passing it to the nn.Embedding layer.
If the input tensor is not of type LongTensor or IntTensor, cast it to the correct type using the to method or the long method.

Code Changes

import torch
import torch.nn as nn

class ModelWithEmbedding(nn.Module):
    def __init__(self):
        super().__init__()
        self.embedding = nn.Embedding(100, 32)
        self.fc = nn.Linear(32, 10)

    def forward(self, x):
        # Check the type of the input tensor
        if x.dtype not in [torch.long, torch.int]:
            # Cast the input tensor to LongTensor
            x = x.long()
        
        emb = self.embedding(x)  
        return self.fc(emb)

model = ModelWithEmbedding().cuda()
# float32 tensor — wrong dtype for Embedding
x = torch.randn(2, 8, device="cuda", dtype=torch.float32)

# Eager: TypeError
try:
    model(x)
    print("eager: OK")
except (TypeError, RuntimeError) as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

Verification

To verify that the fix worked, you can check the following:

The input tensor to the nn.Embedding layer is of type LongTensor or IntTensor.
The nn.Embedding layer raises a TypeError when the input tensor is not of the correct type.
The model compiles and runs successfully with the correct input type.

Extra Tips

Always check the type of the input tensor before passing it to a layer that expects a specific type.
Use the to method or the long method to cast the input tensor to the correct type.
Test the model with different input types to ensure that it behaves as expected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #file not found #serialization error #model compatibility #GPU setup #container setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix `torch.compile` silently accepts `nn.Embedding` with float tensor indices where eager raises TypeError [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

float32 tensor — wrong dtype for Embedding

Eager: TypeError

Compiled: silently succeeds

PR fix notes

PR #179754: [pt2] Add indices dtype check to embedding meta registration

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Minimal reproducer

Full model-level reproducer (as found by fuzzer)

Behavior summary

Ablation

Error logs

Versions

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix `torch.compile` silently accepts `nn.Embedding` with float tensor indices where eager raises TypeError [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

float32 tensor — wrong dtype for Embedding

Eager: TypeError

Compiled: silently succeeds

PR fix notes

PR #179754: [pt2] Add indices dtype check to embedding meta registration

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Minimal reproducer

Full model-level reproducer (as found by fuzzer)

Behavior summary

Ablation

Error logs

Versions

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING