transformers - 💡(How to fix) Fix Transformers v5 fills non-persistent buffers with junk [2 comments, 3 participants]

transformers2026-03-09 03:43:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44534•Fetched 2026-04-08 00:27:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

mentioned ×5subscribed ×5commented ×2closed ×1

Root Cause

I am recreating issue #43644 because I do not believe it was fairly closed.

Code Example

import torch

from transformers import PreTrainedModel, PretrainedConfig


class MyModelConfig(PretrainedConfig):
    model_type = "my_model"
    
    def __init__(
        self,
        temperature: float = 1.0,
        **kwargs,
    ) -> None:
        super().__init__(**kwargs)
        self.temperature = temperature


class MyModel(PreTrainedModel):
    config_class = MyModelConfig
    
    def __init__(self, config: MyModelConfig) -> None:
        super().__init__(config)
        
        self.classifier = torch.nn.Linear(1024, 1)
        
        self.temperature: torch.Tensor
        self.register_buffer("temperature", torch.tensor(config.temperature), persistent=False)
        
        self.post_init()
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        logits = self.classifier(x)
        
        return logits / self.temperature

model = MyModel(MyModelConfig(temperature=0.5))

print(f"Original model temperature: {model.temperature}") # It will print 0.5 as expected

model.save_pretrained("my_model")
model = MyModel.from_pretrained("my_model")

print(f"Loaded model temperature: {model.temperature}") # It will print a random junk value like 1.983314177778084e-05 instead of the original 0.5 value

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.3.0
Platform: Linux-6.17.0-14-generic-x86_64-with-glibc2.39
Python version: 3.12.3
Huggingface_hub version: 1.6.0
Safetensors version: 0.7.0
Accelerate version: not installed
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.10.0+cu128 (CUDA)
Using distributed or parallel set-up in script?: N/A
Using GPU in script?: N/A
GPU type: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition

Who can help?

@ArthurZucker @CyrilVallez

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I am recreating issue #43644 because I do not believe it was fairly closed.

Right now, in version 5 of Transformers, if you define a custom model that has a non-persistent PyTorch buffer, when you go to load that model again, all of the weights in your non-persistent buffer will turn into junk.

While I understand that the way model loading now works is such that this occurs, this behavior is nevertheless:

a major breaking change to any model leveraging non-persistent buffers.
extremely unituitive, diverging from reasonably expectable behavior.
dangerous in that there is no warning that this occurs, causing users not to realize why their models suddenly aren't working (I ended up spending multiple days trying to debug this before realizing the issue was with version 5 of Transformers).

To illustrate how unituitive and damaging this behavior is, see the below example:

import torch

from transformers import PreTrainedModel, PretrainedConfig


class MyModelConfig(PretrainedConfig):
    model_type = "my_model"
    
    def __init__(
        self,
        temperature: float = 1.0,
        **kwargs,
    ) -> None:
        super().__init__(**kwargs)
        self.temperature = temperature


class MyModel(PreTrainedModel):
    config_class = MyModelConfig
    
    def __init__(self, config: MyModelConfig) -> None:
        super().__init__(config)
        
        self.classifier = torch.nn.Linear(1024, 1)
        
        self.temperature: torch.Tensor
        self.register_buffer("temperature", torch.tensor(config.temperature), persistent=False)
        
        self.post_init()
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        logits = self.classifier(x)
        
        return logits / self.temperature

model = MyModel(MyModelConfig(temperature=0.5))

print(f"Original model temperature: {model.temperature}") # It will print 0.5 as expected

model.save_pretrained("my_model")
model = MyModel.from_pretrained("my_model")

print(f"Loaded model temperature: {model.temperature}") # It will print a random junk value like 1.983314177778084e-05 instead of the original 0.5 value

It is very common to define static variables in a model that are stored in a non-persistent buffer. When you go to load your model again, those variables will be overwritten entirely without you knowing that happened. Now, all of the logits in the example model would end up being scaled incorrectly.

I would like to strongly reiterate my request that this issue be resolved.

Expected behavior

Non-persistent PyTorch buffers do not get filled in with random junk values when they are reloaded.

extent analysis

Fix Plan

Make Non-Persistent Buffers Persistent

To fix the issue, you can make the non-persistent buffer temperature persistent by changing the persistent argument in the register_buffer method to True.

self.register_buffer("temperature", torch.tensor(config.temperature), persistent=True)

Alternatively, you can also make the buffer persistent by using the buffer attribute directly on the model instance.

self.temperature = torch.tensor(config.temperature)
self.__dict__["temperature"] = self.temperature

However, the first approach is more idiomatic and recommended.

Update Model Loading

Additionally, you can update the from_pretrained method to handle non-persistent buffers by copying the buffer values from the saved model.

class MyModel(PreTrainedModel):
    ...

    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        model = super().from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
        if hasattr(model, "temperature"):
            model.temperature.copy_(torch.load(f"{pretrained_model_name_or_path}/temperature.pt"))
        return model

You will also need to save the buffer values when saving the model.

class MyModel(PreTrainedModel):
    ...

    def save_pretrained(self, save_directory):
        super().save_pretrained(save_directory)
        torch.save(self.temperature, f"{save_directory}/temperature.pt")

Update Example Code

Here is the updated example code:

import torch

from transformers import PreTrainedModel, PretrainedConfig


class MyModelConfig(PretrainedConfig):
    model_type = "my_model"
    
    def __init__(
        self,
        temperature: float = 1.0,
        **kwargs,
    ) -> None:
        super().__init__(**kwargs)
        self.temperature =

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Non-persistent PyTorch buffers do not get filled in with random junk values when they are reloaded.

#api #ssr #installation #tensor shape #autograd error #model save/load #optimization #mixed precision #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix Transformers v5 fills non-persistent buffers with junk [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Make Non-Persistent Buffers Persistent

Update Model Loading

Update Example Code

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix Transformers v5 fills non-persistent buffers with junk [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Make Non-Persistent Buffers Persistent

Update Model Loading

Update Example Code

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING