pytorch - 💡(How to fix) Fix `torch.compile` silently accepts `nn.Parameter` as `fill_value` in `torch.full()` where eager raises TypeError [1 comments, 2 participants]

pytorch2026-03-21 08:03:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178046•Fetched 2026-04-08 01:07:25

View on GitHub

Comments

Participants

Timeline

106

Reactions

Author

himi1008

Participants

cleonard530

himi1008

Timeline (top)

mentioned ×47subscribed ×47labeled ×8unlabeled ×2

Error Message

import torch import torch.nn as nn

class Model(nn.Module): def init(self): super().init() self.fill_value = nn.Parameter(torch.tensor(1.0))

def forward(self, x):
    batch_size, seq_len = x.shape[:2]
    # nn.Parameter passed directly to torch.full() — should be .item()
    constant = torch.full(
        (batch_size, seq_len, seq_len),
        fill_value=self.fill_value,  # <-- Parameter, not scalar!
        dtype=x.dtype,
        device=x.device,
    )
    return torch.cumsum(constant, dim=-1)

model = Model().cuda() x = torch.randn(2, 8, 32, device="cuda")

Eager: TypeError

try: model(x) print("eager: OK") except TypeError as e: print(f"eager: ERROR — {e}")

Compiled: silently succeeds

torch._dynamo.reset() compiled_model = torch.compile(model, fullgraph=True) try: out = compiled_model(x) print(f"compile: OK shape={out.shape}") except Exception as e: print(f"compile: ERROR — {e}")

Root Cause

torch.full() expects fill_value to be a Number (int, float, etc.). When an nn.Parameter is passed, eager mode's C++ dispatch correctly rejects the argument type. However, torch.compile / Dynamo traces through the graph and apparently unwraps the nn.Parameter to its scalar value during tracing, bypassing the type check.

While this may seem like a "convenient" behavior, it creates a semantic inconsistency between eager and compiled execution that can mask user errors. The correct fix for user code is self.fill_value.item(), but the compiler should ideally flag this or maintain consistency.

Fix Action

Fix / Workaround

Code Example

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fill_value = nn.Parameter(torch.tensor(1.0))

    def forward(self, x):
        batch_size, seq_len = x.shape[:2]
        # nn.Parameter passed directly to torch.full() — should be .item()
        constant = torch.full(
            (batch_size, seq_len, seq_len),
            fill_value=self.fill_value,  # <-- Parameter, not scalar!
            dtype=x.dtype,
            device=x.device,
        )
        return torch.cumsum(constant, dim=-1)

model = Model().cuda()
x = torch.randn(2, 8, 32, device="cuda")

# Eager: TypeError
try:
    model(x)
    print("eager: OK")
except TypeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

---

import torch
import torch.nn as nn
import torch.nn.functional as F

class MonotonicSequenceModel(nn.Module):
    def __init__(self, input_dim=512, hidden_dim=1024, seq_len=256, num_heads=8):
        super().__init__()
        self.seq_len = seq_len
        self.hidden_dim = hidden_dim
        self.input_proj = nn.Linear(input_dim, hidden_dim)
        self.fill_value = nn.Parameter(torch.tensor(1.0))
        self.qkv_proj = nn.Linear(hidden_dim, hidden_dim * 3)
        self.out_proj = nn.Linear(hidden_dim, hidden_dim)
        self.ffn = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim * 4),
            nn.GELU(),
            nn.Linear(hidden_dim * 4, hidden_dim),
        )
        self.norm1 = nn.LayerNorm(hidden_dim)
        self.norm2 = nn.LayerNorm(hidden_dim)
        self.dropout = nn.Dropout(0.1)
        self.output_proj = nn.Linear(hidden_dim, input_dim)

    def forward(self, x):
        batch_size, seq_len, _ = x.shape
        x = self.input_proj(x)
        # BUG: self.fill_value is nn.Parameter, not a scalar
        constant_tensor = torch.full(
            (batch_size, seq_len, seq_len),
            fill_value=self.fill_value,
            dtype=x.dtype,
            device=x.device,
        )
        monotonic_seq = torch.cumsum(constant_tensor, dim=-1)
        attention_scale = constant_tensor.mean(dim=-1, keepdim=True) * 0.1
        rel_pos_bias = monotonic_seq.unsqueeze(1)
        qkv = self.qkv_proj(x).reshape(batch_size, seq_len, 3, self.hidden_dim)
        q, k, v = qkv.unbind(dim=2)
        head_dim = self.hidden_dim // 8
        q = q.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        k = k.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        v = v.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        attn_scores = torch.matmul(q, k.transpose(-2, -1)) / head_dim ** 0.5
        attn_scores = attn_scores + rel_pos_bias * 0.01
        attn_weights = F.softmax(attn_scores, dim=-1)
        attn_output = torch.matmul(attn_weights, v)
        attn_output = attn_output.transpose(1, 2).reshape(batch_size, seq_len, self.hidden_dim)
        attn_output = self.out_proj(attn_output)
        x = self.norm1(x + self.dropout(attn_output * attention_scale.expand(-1, -1, self.hidden_dim)))
        ffn_gate = monotonic_seq.mean(dim=-1, keepdim=True).expand(-1, -1, self.hidden_dim)
        ffn_input = x * ffn_gate
        ffn_output = self.ffn(ffn_input)
        x = self.norm2(x + self.dropout(ffn_output))
        final_output = x + constant_tensor.mean(dim=-1, keepdim=True).expand(-1, -1, self.hidden_dim) * 0.01
        output = self.output_proj(final_output)
        return output, constant_tensor, monotonic_seq

model = MonotonicSequenceModel().cuda().eval()
x = torch.randn(2, 256, 512, device="cuda")

# Eager: TypeError
try:
    with torch.no_grad():
        model(x)
    print("eager: OK")
except TypeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    with torch.no_grad():
        out = compiled_model(x)
    print(f"compile: OK")
except Exception as e:
    print(f"compile: ERROR — {e}")

---

TypeError: full() received an invalid combination of arguments - got (tuple,
device=torch.device, dtype=torch.dtype, fill_value=Parameter), but expected one of:
 * (tuple of ints size, Number fill_value, *, ...)
 * (tuple of ints size, Number fill_value, *, Tensor out = None, ...)

---

(no error — silently returns tensor)

---

PyTorch version: 2.12.0.dev20260315+cu126
OS: Ubuntu 22.04.5 LTS (x86_64)
Python version: 3.10.12
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile silently succeeds when torch.full() receives an nn.Parameter object as fill_value, while eager mode correctly raises TypeError: full() received an invalid combination of arguments.

The fill_value parameter of torch.full() expects a scalar Number, not a tensor or nn.Parameter. Eager mode strictly enforces this type constraint. However, when using torch.compile, Inductor apparently converts the nn.Parameter to its scalar value during graph tracing.

This was independently discovered via two different fuzzer-generated models targeting the pointless_cumsum_replacement optimization pattern. Both models create a learnable nn.Parameter(torch.tensor(1.0)) and pass it directly to torch.full().

Minimal reproducer

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fill_value = nn.Parameter(torch.tensor(1.0))

    def forward(self, x):
        batch_size, seq_len = x.shape[:2]
        # nn.Parameter passed directly to torch.full() — should be .item()
        constant = torch.full(
            (batch_size, seq_len, seq_len),
            fill_value=self.fill_value,  # <-- Parameter, not scalar!
            dtype=x.dtype,
            device=x.device,
        )
        return torch.cumsum(constant, dim=-1)

model = Model().cuda()
x = torch.randn(2, 8, 32, device="cuda")

# Eager: TypeError
try:
    model(x)
    print("eager: OK")
except TypeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except Exception as e:
    print(f"compile: ERROR — {e}")

Full model-level reproducer (pointless_cumsum_replacement-5)

import torch
import torch.nn as nn
import torch.nn.functional as F

class MonotonicSequenceModel(nn.Module):
    def __init__(self, input_dim=512, hidden_dim=1024, seq_len=256, num_heads=8):
        super().__init__()
        self.seq_len = seq_len
        self.hidden_dim = hidden_dim
        self.input_proj = nn.Linear(input_dim, hidden_dim)
        self.fill_value = nn.Parameter(torch.tensor(1.0))
        self.qkv_proj = nn.Linear(hidden_dim, hidden_dim * 3)
        self.out_proj = nn.Linear(hidden_dim, hidden_dim)
        self.ffn = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim * 4),
            nn.GELU(),
            nn.Linear(hidden_dim * 4, hidden_dim),
        )
        self.norm1 = nn.LayerNorm(hidden_dim)
        self.norm2 = nn.LayerNorm(hidden_dim)
        self.dropout = nn.Dropout(0.1)
        self.output_proj = nn.Linear(hidden_dim, input_dim)

    def forward(self, x):
        batch_size, seq_len, _ = x.shape
        x = self.input_proj(x)
        # BUG: self.fill_value is nn.Parameter, not a scalar
        constant_tensor = torch.full(
            (batch_size, seq_len, seq_len),
            fill_value=self.fill_value,
            dtype=x.dtype,
            device=x.device,
        )
        monotonic_seq = torch.cumsum(constant_tensor, dim=-1)
        attention_scale = constant_tensor.mean(dim=-1, keepdim=True) * 0.1
        rel_pos_bias = monotonic_seq.unsqueeze(1)
        qkv = self.qkv_proj(x).reshape(batch_size, seq_len, 3, self.hidden_dim)
        q, k, v = qkv.unbind(dim=2)
        head_dim = self.hidden_dim // 8
        q = q.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        k = k.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        v = v.view(batch_size, seq_len, 8, head_dim).transpose(1, 2)
        attn_scores = torch.matmul(q, k.transpose(-2, -1)) / head_dim ** 0.5
        attn_scores = attn_scores + rel_pos_bias * 0.01
        attn_weights = F.softmax(attn_scores, dim=-1)
        attn_output = torch.matmul(attn_weights, v)
        attn_output = attn_output.transpose(1, 2).reshape(batch_size, seq_len, self.hidden_dim)
        attn_output = self.out_proj(attn_output)
        x = self.norm1(x + self.dropout(attn_output * attention_scale.expand(-1, -1, self.hidden_dim)))
        ffn_gate = monotonic_seq.mean(dim=-1, keepdim=True).expand(-1, -1, self.hidden_dim)
        ffn_input = x * ffn_gate
        ffn_output = self.ffn(ffn_input)
        x = self.norm2(x + self.dropout(ffn_output))
        final_output = x + constant_tensor.mean(dim=-1, keepdim=True).expand(-1, -1, self.hidden_dim) * 0.01
        output = self.output_proj(final_output)
        return output, constant_tensor, monotonic_seq

model = MonotonicSequenceModel().cuda().eval()
x = torch.randn(2, 256, 512, device="cuda")

# Eager: TypeError
try:
    with torch.no_grad():
        model(x)
    print("eager: OK")
except TypeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: silently succeeds
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    with torch.no_grad():
        out = compiled_model(x)
    print(f"compile: OK")
except Exception as e:
    print(f"compile: ERROR — {e}")

Affected files

File	Source	Pattern
`pointless_cumsum_replacement-5.py`	E6 (full system), R5	`pointless_cumsum_replacement`
`pointless_cumsum_replacement-7.py`	E6 (full system), R5	`pointless_cumsum_replacement`

Behavior summary

Operation	Eager	`torch.compile`	Consistent?
`torch.full(size, fill_value=nn.Parameter)`	`TypeError`	Succeeds	No

Root cause analysis

Ablation

Both instances were discovered in E6 (full system) round-5, the most advanced experiment with feedback + self-repair. This pattern only emerged through the full pipeline, not in simpler configurations.

Error logs

Eager mode (correct behavior):

TypeError: full() received an invalid combination of arguments - got (tuple,
device=torch.device, dtype=torch.dtype, fill_value=Parameter), but expected one of:
 * (tuple of ints size, Number fill_value, *, ...)
 * (tuple of ints size, Number fill_value, *, Tensor out = None, ...)

torch.compile (incorrect — should raise the same error):

(no error — silently returns tensor)

Versions

PyTorch version: 2.12.0.dev20260315+cu126
OS: Ubuntu 22.04.5 LTS (x86_64)
Python version: 3.10.12
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

cc @gchanan @mruberry @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo

extent analysis

Fix Plan

To fix the issue, you should use the .item() method to get the scalar value from the nn.Parameter object. Here are the steps:

Replace self.fill_value with self.fill_value.item() in the torch.full() function calls.
This will ensure that the fill_value argument is a scalar, as expected by torch.full().

Example code:

constant = torch.full(
    (batch_size, seq_len, seq_len),
    fill_value=self.fill_value.item(),  # <-- Use .item() to get the scalar value
    dtype=x.dtype,
    device=x.device,
)

And similarly for the other torch.full() call:

constant_tensor = torch.full(
    (batch_size, seq_len, seq_len),
    fill_value=self.fill_value.item(),  # <-- Use .item() to get the scalar value
    dtype=x.dtype,
    device=x.device,
)

Verification

To verify that the fix worked, you can run the model in both eager and compiled modes and check that the output is correct and consistent.

Run the model in eager mode and check that it raises no errors.
Run the model in compiled mode and check that it produces the same output as eager mode.

Extra Tips

Always use the .item() method to get the scalar value from an nn.Parameter object when passing it to a function that expects a scalar argument.
Be aware of the differences in behavior between eager and compiled modes, and test your model in both modes to ensure consistency.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #optimization #SSR setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix `torch.compile` silently accepts `nn.Parameter` as `fill_value` in `torch.full()` where eager raises TypeError [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Eager: TypeError

Compiled: silently succeeds

Root Cause

Fix Action

Fix / Workaround

Code Example

🐛 Describe the bug

Minimal reproducer

Full model-level reproducer (pointless_cumsum_replacement-5)

Affected files

Behavior summary

Root cause analysis

Ablation

Error logs

Versions

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix `torch.compile` silently accepts `nn.Parameter` as `fill_value` in `torch.full()` where eager raises TypeError [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Eager: TypeError

Compiled: silently succeeds

Root Cause

Fix Action

Fix / Workaround

Code Example

🐛 Describe the bug

Minimal reproducer

Full model-level reproducer (pointless_cumsum_replacement-5)

Affected files

Behavior summary

Root cause analysis

Ablation

Error logs

Versions

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING