pytorch - 💡(How to fix) Fix `torch.compile` silently accepts `nn.Parameter` as `fill_value` in `torch.full()` where eager raises TypeError [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178046Fetched 2026-04-08 01:07:25
View on GitHub
Comments
1
Participants
2
Timeline
106
Reactions
0
Author
Timeline (top)
mentioned ×47subscribed ×47labeled ×8unlabeled ×2

Error Message

import torch import torch.nn as nn

class Model(nn.Module): def init(self): super().init() self.fill_value = nn.Parameter(torch.tensor(1.0))

def forward(self, x):
    batch_size, seq_len = x.shape[:2]
    # nn.Parameter passed directly to torch.full() — should be .item()
    constant = torch.full(
        (batch_size, seq_len, seq_len),
        fill_value=self.fill_value,  # <-- Parameter, not scalar!
        dtype=x.dtype,
        device=x.device,
    )
    return torch.cumsum(constant, dim=-1)

model = Model().cuda() x = torch.randn(2, 8, 32, device="cuda")

Eager: TypeError

try: model(x) print("eager: OK") except TypeError as e: print(f"eager: ERROR — {e}")

Compiled: silently succeeds

torch._dynamo.reset() compiled_model = torch.compile(model, fullgraph=True) try: out = compiled_model(x) print(f"compile: OK shape={out.shape}") except Exception as e: print(f"compile: ERROR — {e}")

Root Cause

torch.full() expects fill_value to be a Number (int, float, etc.). When an nn.Parameter is passed, eager mode's C++ dispatch correctly rejects the argument type. However, torch.compile / Dynamo traces through the graph and apparently unwraps the nn.Parameter to its scalar value during tracing, bypassing the type check.

While this may seem like a "convenient" behavior, it creates a semantic inconsistency between eager and compiled execution that can mask user errors. The correct fix for user code is self.fill_value.item(), but the compiler should ideally flag this or maintain consistency.

Fix Action

Fix / Workaround

torch.full() expects fill_value to be a Number (int, float, etc.). When an nn.Parameter is passed, eager mode's C++ dispatch correctly rejects the argument type. However, torch.compile / Dynamo traces through the graph and apparently unwraps the nn.Parameter to its scalar value during tracing, bypassing the type check.

Code Example

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fill_value = nn.Parameter(torch.tensor(1.0))

    def forward(self, x):
        batch_size, seq_len = x.shape[:2]
        # nn.Parameter passed directly to torch.full() — should be .item()
        constant = torch.full(
            (batch_size, seq_len, seq_len),
            fill_value=self.fill_value,  # <-- Parameter, not scalar!
            dtype=x.dtype,
            device=x.device,
        )
        return torch.cumsum(constant, dim=-1)

model = Model().cuda()
x = torch.randn(2, 8, 32, device="cuda")

# Eager: TypeError
try:
    model(x)
    print("eager: OK")
except TypeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

---

import torch
import torch.nn as nn
import torch.nn.functional as F

class MonotonicSequenceModel(nn.Module):
    def __init__(self, input_dim=512, hidden_dim=1024, seq_len=256, num_heads=8):
        super().__init__()
        self.seq_len = seq_len
        self.hidden_dim = hidden_dim
        self.input_proj = nn.Linear(input_dim, hidden_dim)
        self.fill_value = nn.Parameter(torch.tensor(1.0))
        self.qkv_proj = nn.Linear(hidden_dim, hidden_dim * 3)
        self.out_proj = nn.Linear(hidden_dim, hidden_dim)
        self.ffn = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim * 4),
            nn.GELU(),
            nn.Linear(hidden_dim * 4, hidden_dim),
        )
        self.norm1 = nn.LayerNorm(hidden_dim)
        self.norm2 = nn.LayerNorm(hidden_dim)
        self.dropout = nn.Dropout(0.1)
        self.output_proj = nn.Linear(hidden_dim, input_dim)

    def forward(self, x):
        batch_size, seq_len, _ = x.shape
        x = self.input_proj(x)
        # BUG: self.fill_value is nn.Parameter, not a scalar
        constant_tensor = torch.full(
            (batch_size, seq_len, seq_len),
            fill_value=self.fill_value,
            dtype=x.dtype,
            device=x.device,
        )
        monotonic_seq = torch.cumsum(constant_tensor, dim=-1)
        attention_scale = constant_tensor.mean(dim=-1, keepdim=True) * 0.1
        rel_pos_bias = monotonic_seq.unsqueeze(1)
        qkv = self.qkv_proj(x).reshape(batch_size, seq_len, 3, self.hidden_dim)
        q, k, v = qkv.unbind(dim=2)
        head_dim = self.hidden_dim // 8
        q = q.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        k = k.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        v = v.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        attn_scores = torch.matmul(q, k.transpose(-2, -1)) / head_dim ** 0.5
        attn_scores = attn_scores + rel_pos_bias * 0.01
        attn_weights = F.softmax(attn_scores, dim=-1)
        attn_output = torch.matmul(attn_weights, v)
        attn_output = attn_output.transpose(1, 2).reshape(batch_size, seq_len, self.hidden_dim)
        attn_output = self.out_proj(attn_output)
        x = self.norm1(x + self.dropout(attn_output * attention_scale.expand(-1, -1, self.hidden_dim)))
        ffn_gate = monotonic_seq.mean(dim=-1, keepdim=True).expand(-1, -1, self.hidden_dim)
        ffn_input = x * ffn_gate
        ffn_output = self.ffn(ffn_input)
        x = self.norm2(x + self.dropout(ffn_output))
        final_output = x + constant_tensor.mean(dim=-1, keepdim=True).expand(-1, -1, self.hidden_dim) * 0.01
        output = self.output_proj(final_output)
        return output, constant_tensor, monotonic_seq

model = MonotonicSequenceModel().cuda().eval()
x = torch.randn(2, 256, 512, device="cuda")

# Eager: TypeError
try:
    with torch.no_grad():
        model(x)
    print("eager: OK")
except TypeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    with torch.no_grad():
        out = compiled_model(x)
    print(f"compile: OK")
except Exception as e:
    print(f"compile: ERROR — {e}")

---

TypeError: full() received an invalid combination of arguments - got (tuple,
device=torch.device, dtype=torch.dtype, fill_value=Parameter), but expected one of:
 * (tuple of ints size, Number fill_value, *, ...)
 * (tuple of ints size, Number fill_value, *, Tensor out = None, ...)

---

(no error — silently returns tensor)

---

PyTorch version: 2.12.0.dev20260315+cu126
OS: Ubuntu 22.04.5 LTS (x86_64)
Python version: 3.10.12
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile silently succeeds when torch.full() receives an nn.Parameter object as fill_value, while eager mode correctly raises TypeError: full() received an invalid combination of arguments.

The fill_value parameter of torch.full() expects a scalar Number, not a tensor or nn.Parameter. Eager mode strictly enforces this type constraint. However, when using torch.compile, Inductor apparently converts the nn.Parameter to its scalar value during graph tracing.

This was independently discovered via two different fuzzer-generated models targeting the pointless_cumsum_replacement optimization pattern. Both models create a learnable nn.Parameter(torch.tensor(1.0)) and pass it directly to torch.full().

Minimal reproducer

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fill_value = nn.Parameter(torch.tensor(1.0))

    def forward(self, x):
        batch_size, seq_len = x.shape[:2]
        # nn.Parameter passed directly to torch.full() — should be .item()
        constant = torch.full(
            (batch_size, seq_len, seq_len),
            fill_value=self.fill_value,  # <-- Parameter, not scalar!
            dtype=x.dtype,
            device=x.device,
        )
        return torch.cumsum(constant, dim=-1)

model = Model().cuda()
x = torch.randn(2, 8, 32, device="cuda")

# Eager: TypeError
try:
    model(x)
    print("eager: OK")
except TypeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

Full model-level reproducer (pointless_cumsum_replacement-5)

import torch
import torch.nn as nn
import torch.nn.functional as F

class MonotonicSequenceModel(nn.Module):
    def __init__(self, input_dim=512, hidden_dim=1024, seq_len=256, num_heads=8):
        super().__init__()
        self.seq_len = seq_len
        self.hidden_dim = hidden_dim
        self.input_proj = nn.Linear(input_dim, hidden_dim)
        self.fill_value = nn.Parameter(torch.tensor(1.0))
        self.qkv_proj = nn.Linear(hidden_dim, hidden_dim * 3)
        self.out_proj = nn.Linear(hidden_dim, hidden_dim)
        self.ffn = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim * 4),
            nn.GELU(),
            nn.Linear(hidden_dim * 4, hidden_dim),
        )
        self.norm1 = nn.LayerNorm(hidden_dim)
        self.norm2 = nn.LayerNorm(hidden_dim)
        self.dropout = nn.Dropout(0.1)
        self.output_proj = nn.Linear(hidden_dim, input_dim)

    def forward(self, x):
        batch_size, seq_len, _ = x.shape
        x = self.input_proj(x)
        # BUG: self.fill_value is nn.Parameter, not a scalar
        constant_tensor = torch.full(
            (batch_size, seq_len, seq_len),
            fill_value=self.fill_value,
            dtype=x.dtype,
            device=x.device,
        )
        monotonic_seq = torch.cumsum(constant_tensor, dim=-1)
        attention_scale = constant_tensor.mean(dim=-1, keepdim=True) * 0.1
        rel_pos_bias = monotonic_seq.unsqueeze(1)
        qkv = self.qkv_proj(x).reshape(batch_size, seq_len, 3, self.hidden_dim)
        q, k, v = qkv.unbind(dim=2)
        head_dim = self.hidden_dim // 8
        q = q.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        k = k.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        v = v.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        attn_scores = torch.matmul(q, k.transpose(-2, -1)) / head_dim ** 0.5
        attn_scores = attn_scores + rel_pos_bias * 0.01
        attn_weights = F.softmax(attn_scores, dim=-1)
        attn_output = torch.matmul(attn_weights, v)
        attn_output = attn_output.transpose(1, 2).reshape(batch_size, seq_len, self.hidden_dim)
        attn_output = self.out_proj(attn_output)
        x = self.norm1(x + self.dropout(attn_output * attention_scale.expand(-1, -1, self.hidden_dim)))
        ffn_gate = monotonic_seq.mean(dim=-1, keepdim=True).expand(-1, -1, self.hidden_dim)
        ffn_input = x * ffn_gate
        ffn_output = self.ffn(ffn_input)
        x = self.norm2(x + self.dropout(ffn_output))
        final_output = x + constant_tensor.mean(dim=-1, keepdim=True).expand(-1, -1, self.hidden_dim) * 0.01
        output = self.output_proj(final_output)
        return output, constant_tensor, monotonic_seq

model = MonotonicSequenceModel().cuda().eval()
x = torch.randn(2, 256, 512, device="cuda")

# Eager: TypeError
try:
    with torch.no_grad():
        model(x)
    print("eager: OK")
except TypeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    with torch.no_grad():
        out = compiled_model(x)
    print(f"compile: OK")
except Exception as e:
    print(f"compile: ERROR — {e}")

Affected files

FileSourcePattern
pointless_cumsum_replacement-5.pyE6 (full system), R5pointless_cumsum_replacement
pointless_cumsum_replacement-7.pyE6 (full system), R5pointless_cumsum_replacement

Behavior summary

OperationEagertorch.compileConsistent?
torch.full(size, fill_value=nn.Parameter)TypeErrorSucceedsNo

Root cause analysis

torch.full() expects fill_value to be a Number (int, float, etc.). When an nn.Parameter is passed, eager mode's C++ dispatch correctly rejects the argument type. However, torch.compile / Dynamo traces through the graph and apparently unwraps the nn.Parameter to its scalar value during tracing, bypassing the type check.

While this may seem like a "convenient" behavior, it creates a semantic inconsistency between eager and compiled execution that can mask user errors. The correct fix for user code is self.fill_value.item(), but the compiler should ideally flag this or maintain consistency.

Ablation

Both instances were discovered in E6 (full system) round-5, the most advanced experiment with feedback + self-repair. This pattern only emerged through the full pipeline, not in simpler configurations.

Error logs

Eager mode (correct behavior):

TypeError: full() received an invalid combination of arguments - got (tuple,
device=torch.device, dtype=torch.dtype, fill_value=Parameter), but expected one of:
 * (tuple of ints size, Number fill_value, *, ...)
 * (tuple of ints size, Number fill_value, *, Tensor out = None, ...)

torch.compile (incorrect — should raise the same error):

(no error — silently returns tensor)

Versions

PyTorch version: 2.12.0.dev20260315+cu126
OS: Ubuntu 22.04.5 LTS (x86_64)
Python version: 3.10.12
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

cc @gchanan @mruberry @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo

extent analysis

Fix Plan

To fix the issue, you should use the .item() method to get the scalar value from the nn.Parameter object. Here are the steps:

  • Replace self.fill_value with self.fill_value.item() in the torch.full() function calls.
  • This will ensure that the fill_value argument is a scalar, as expected by torch.full().

Example code:

constant = torch.full(
    (batch_size, seq_len, seq_len),
    fill_value=self.fill_value.item(),  # <-- Use .item() to get the scalar value
    dtype=x.dtype,
    device=x.device,
)

And similarly for the other torch.full() call:

constant_tensor = torch.full(
    (batch_size, seq_len, seq_len),
    fill_value=self.fill_value.item(),  # <-- Use .item() to get the scalar value
    dtype=x.dtype,
    device=x.device,
)

Verification

To verify that the fix worked, you can run the model in both eager and compiled modes and check that the output is correct and consistent.

  • Run the model in eager mode and check that it raises no errors.
  • Run the model in compiled mode and check that it produces the same output as eager mode.

Extra Tips

  • Always use the .item() method to get the scalar value from an nn.Parameter object when passing it to a function that expects a scalar argument.
  • Be aware of the differences in behavior between eager and compiled modes, and test your model in both modes to ensure consistency.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING