transformers - 💡(How to fix) Fix Transformers v5 fills non-persistent buffers with junk [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44534Fetched 2026-04-08 00:27:51
View on GitHub
Comments
2
Participants
3
Timeline
14
Reactions
0
Timeline (top)
mentioned ×5subscribed ×5commented ×2closed ×1

Root Cause

I am recreating issue #43644 because I do not believe it was fairly closed.

Code Example

import torch

from transformers import PreTrainedModel, PretrainedConfig


class MyModelConfig(PretrainedConfig):
    model_type = "my_model"
    
    def __init__(
        self,
        temperature: float = 1.0,
        **kwargs,
    ) -> None:
        super().__init__(**kwargs)
        self.temperature = temperature


class MyModel(PreTrainedModel):
    config_class = MyModelConfig
    
    def __init__(self, config: MyModelConfig) -> None:
        super().__init__(config)
        
        self.classifier = torch.nn.Linear(1024, 1)
        
        self.temperature: torch.Tensor
        self.register_buffer("temperature", torch.tensor(config.temperature), persistent=False)
        
        self.post_init()
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        logits = self.classifier(x)
        
        return logits / self.temperature

model = MyModel(MyModelConfig(temperature=0.5))

print(f"Original model temperature: {model.temperature}") # It will print 0.5 as expected

model.save_pretrained("my_model")
model = MyModel.from_pretrained("my_model")

print(f"Loaded model temperature: {model.temperature}") # It will print a random junk value like 1.983314177778084e-05 instead of the original 0.5 value
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.3.0
  • Platform: Linux-6.17.0-14-generic-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • Huggingface_hub version: 1.6.0
  • Safetensors version: 0.7.0
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.10.0+cu128 (CUDA)
  • Using distributed or parallel set-up in script?: N/A
  • Using GPU in script?: N/A
  • GPU type: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition

Who can help?

@ArthurZucker @CyrilVallez

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am recreating issue #43644 because I do not believe it was fairly closed.

Right now, in version 5 of Transformers, if you define a custom model that has a non-persistent PyTorch buffer, when you go to load that model again, all of the weights in your non-persistent buffer will turn into junk.

While I understand that the way model loading now works is such that this occurs, this behavior is nevertheless:

  1. a major breaking change to any model leveraging non-persistent buffers.
  2. extremely unituitive, diverging from reasonably expectable behavior.
  3. dangerous in that there is no warning that this occurs, causing users not to realize why their models suddenly aren't working (I ended up spending multiple days trying to debug this before realizing the issue was with version 5 of Transformers).

To illustrate how unituitive and damaging this behavior is, see the below example:

import torch

from transformers import PreTrainedModel, PretrainedConfig


class MyModelConfig(PretrainedConfig):
    model_type = "my_model"
    
    def __init__(
        self,
        temperature: float = 1.0,
        **kwargs,
    ) -> None:
        super().__init__(**kwargs)
        self.temperature = temperature


class MyModel(PreTrainedModel):
    config_class = MyModelConfig
    
    def __init__(self, config: MyModelConfig) -> None:
        super().__init__(config)
        
        self.classifier = torch.nn.Linear(1024, 1)
        
        self.temperature: torch.Tensor
        self.register_buffer("temperature", torch.tensor(config.temperature), persistent=False)
        
        self.post_init()
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        logits = self.classifier(x)
        
        return logits / self.temperature

model = MyModel(MyModelConfig(temperature=0.5))

print(f"Original model temperature: {model.temperature}") # It will print 0.5 as expected

model.save_pretrained("my_model")
model = MyModel.from_pretrained("my_model")

print(f"Loaded model temperature: {model.temperature}") # It will print a random junk value like 1.983314177778084e-05 instead of the original 0.5 value

It is very common to define static variables in a model that are stored in a non-persistent buffer. When you go to load your model again, those variables will be overwritten entirely without you knowing that happened. Now, all of the logits in the example model would end up being scaled incorrectly.

I would like to strongly reiterate my request that this issue be resolved.

Expected behavior

Non-persistent PyTorch buffers do not get filled in with random junk values when they are reloaded.

extent analysis

Fix Plan

Make Non-Persistent Buffers Persistent

To fix the issue, you can make the non-persistent buffer temperature persistent by changing the persistent argument in the register_buffer method to True.

self.register_buffer("temperature", torch.tensor(config.temperature), persistent=True)

Alternatively, you can also make the buffer persistent by using the buffer attribute directly on the model instance.

self.temperature = torch.tensor(config.temperature)
self.__dict__["temperature"] = self.temperature

However, the first approach is more idiomatic and recommended.

Update Model Loading

Additionally, you can update the from_pretrained method to handle non-persistent buffers by copying the buffer values from the saved model.

class MyModel(PreTrainedModel):
    ...

    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        model = super().from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
        if hasattr(model, "temperature"):
            model.temperature.copy_(torch.load(f"{pretrained_model_name_or_path}/temperature.pt"))
        return model

You will also need to save the buffer values when saving the model.

class MyModel(PreTrainedModel):
    ...

    def save_pretrained(self, save_directory):
        super().save_pretrained(save_directory)
        torch.save(self.temperature, f"{save_directory}/temperature.pt")

Update Example Code

Here is the updated example code:

import torch

from transformers import PreTrainedModel, PretrainedConfig


class MyModelConfig(PretrainedConfig):
    model_type = "my_model"
    
    def __init__(
        self,
        temperature: float = 1.0,
        **kwargs,
    ) -> None:
        super().__init__(**kwargs)
        self.temperature =

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Non-persistent PyTorch buffers do not get filled in with random junk values when they are reloaded.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - 💡(How to fix) Fix Transformers v5 fills non-persistent buffers with junk [2 comments, 3 participants]