transformers - ✅(Solved) Fix In GPTNeoXConfig, rotary_pct silently reverts to default on reload [2 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44913Fetched 2026-04-08 01:12:32
View on GitHub
Comments
3
Participants
3
Timeline
18
Reactions
0
Author
Timeline (top)
mentioned ×5subscribed ×5commented ×3cross-referenced ×2

Root Cause

https://github.com/huggingface/transformers/blob/3a3b59cb1a7c0238c8d1072e35d3879c5faff48e/src/transformers/models/gpt_neox/configuration_gpt_neox.py#L98

save_pretrained writes partial_rotary_factor inside rope_parameters but does not persist rotary_pct as a top-level key. On reload, rotary_pct is absent from kwargs, so this line unconditionally overwrites the correct value with 0.25.

Fix Action

Fix

rotary_pct = kwargs.pop("rotary_pct", None)
if rotary_pct is not None:
    self.rope_parameters["partial_rotary_factor"] = rotary_pct
else:
    self.rope_parameters.setdefault("partial_rotary_factor", 0.25)

Verified locally after applying this, the value survives the round-trip.

Models using the default rotary_pct=0.25 (gpt-neox-20b, Pythia, etc.) are unaffected since the overwrite produces the same value.

PR fix notes

PR #44917: fix(gpt-neox): preserve rotary_pct across save/load cycle

Description (problem / solution / changelog)

Summary

GPTNeoXConfig.convert_rope_params_to_dict unconditionally overwrote rope_parameters["partial_rotary_factor"] with the default 0.25 when rotary_pct was absent from kwargs.

On every from_pretrained call, rotary_pct is not present as a top-level key in config.json (it is stored inside rope_parameters), so the value was silently reset to 0.25.

Fix

Replace the unconditional assignment with a conditional + setdefault:

-self.rope_parameters["partial_rotary_factor"] = kwargs.pop("rotary_pct", 0.25)
+rotary_pct = kwargs.pop("rotary_pct", None)
+if rotary_pct is not None:
+    self.rope_parameters["partial_rotary_factor"] = rotary_pct
+else:
+    self.rope_parameters.setdefault("partial_rotary_factor", 0.25)

When rotary_pct is absent (the reload path), setdefault only fills in 0.25 if partial_rotary_factor is not already present, so an existing value is preserved.

Fixes #44913

Changed files

  • src/transformers/models/gpt_neox/configuration_gpt_neox.py (modified, +5/-1)

PR #44985: fix: preserve rotary_pct across save/load cycle in GPTNeoX configs

Description (problem / solution / changelog)

Summary

Fixes #44913

When creating a GPTNeoXConfig (or GPTNeoXJapaneseConfig) with a non-default rotary_pct, the value is lost after a save_pretrained / from_pretrained round-trip. This happens because convert_rope_params_to_dict unconditionally overwrites partial_rotary_factor with kwargs.pop("rotary_pct", <default>). On reload, rotary_pct is absent from kwargs (it was saved inside rope_parameters), so the default silently replaces the correct value.

The fix uses the same setdefault pattern recommended by @zucchini-nlp from modeling_rope_utils.py L646-648: only set partial_rotary_factor if rotary_pct is explicitly passed; otherwise, use setdefault to preserve any value already present in rope_parameters.

Both GPTNeoXConfig and GPTNeoXJapaneseConfig had the same bug and are fixed together.

Changes

  • src/transformers/models/gpt_neox/configuration_gpt_neox.py: use setdefault for partial_rotary_factor instead of unconditional assignment
  • src/transformers/models/gpt_neox_japanese/configuration_gpt_neox_japanese.py: same fix (default 1.0 instead of 0.25)

Test plan

  • Verified rotary_pct=0.5 survives save/load round-trip for both GPTNeoXConfig and GPTNeoXJapaneseConfig
  • Verified default rotary_pct (0.25 for GPTNeoX, 1.0 for Japanese) survives save/load round-trip
  • Ran pytest tests/models/gpt_neox/test_modeling_gpt_neox.py -k config -- all passed
  • Ran pytest tests/models/gpt_neox_japanese/test_modeling_gpt_neox_japanese.py -k config -- all passed
  • ruff check and ruff format --check pass on both files

AI assistance was used in drafting; all changes and tests were reviewed and validated by the submitter.

Changed files

  • src/transformers/models/gpt_neox/configuration_gpt_neox.py (modified, +1/-1)
  • src/transformers/models/gpt_neox_japanese/configuration_gpt_neox_japanese.py (modified, +1/-1)

Code Example

rotary_pct = kwargs.pop("rotary_pct", None)
if rotary_pct is not None:
    self.rope_parameters["partial_rotary_factor"] = rotary_pct
else:
    self.rope_parameters.setdefault("partial_rotary_factor", 0.25)

---

from transformers import GPTNeoXConfig

config = GPTNeoXConfig(rotary_pct=1.0)
print(config.rope_parameters["partial_rotary_factor"])  # 1.0
config.save_pretrained("/tmp/test")

config2 = GPTNeoXConfig.from_pretrained("/tmp/test")
print(config2.rope_parameters["partial_rotary_factor"])  # 0.25
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.3.0
  • Platform: Linux-6.17.0-19-generic-x86_64-with-glibc2.39
  • Python version: 3.12.4
  • Huggingface_hub version: 1.7.2
  • Safetensors version: 0.4.5
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.5.1 (NA)
  • Using distributed or parallel set-up in script?: No

When creating a GPTNeoXConfig with non-default rotary_pct, value is lost after save_pretrained / from_pretrained.

Cause

https://github.com/huggingface/transformers/blob/3a3b59cb1a7c0238c8d1072e35d3879c5faff48e/src/transformers/models/gpt_neox/configuration_gpt_neox.py#L98

save_pretrained writes partial_rotary_factor inside rope_parameters but does not persist rotary_pct as a top-level key. On reload, rotary_pct is absent from kwargs, so this line unconditionally overwrites the correct value with 0.25.

Fix

rotary_pct = kwargs.pop("rotary_pct", None)
if rotary_pct is not None:
    self.rope_parameters["partial_rotary_factor"] = rotary_pct
else:
    self.rope_parameters.setdefault("partial_rotary_factor", 0.25)

Verified locally after applying this, the value survives the round-trip.

Models using the default rotary_pct=0.25 (gpt-neox-20b, Pythia, etc.) are unaffected since the overwrite produces the same value.

Who can help?

@ArthurZucker @Cyrilvallez

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import GPTNeoXConfig

config = GPTNeoXConfig(rotary_pct=1.0)
print(config.rope_parameters["partial_rotary_factor"])  # 1.0
config.save_pretrained("/tmp/test")

config2 = GPTNeoXConfig.from_pretrained("/tmp/test")
print(config2.rope_parameters["partial_rotary_factor"])  # 0.25

Expected behavior

partial_rotary_factor value should be retained

extent analysis

Fix Plan

To fix the issue of losing the rotary_pct value after saving and loading a GPTNeoXConfig, you need to modify the GPTNeoXConfig class to properly handle the rotary_pct attribute.

Step-by-Step Solution

  1. Modify the GPTNeoXConfig class: Update the __init__ method to correctly set and save the rotary_pct value.
  2. Update the save_pretrained method: Ensure that the rotary_pct value is saved as a top-level key.
  3. Update the from_pretrained method: Load the rotary_pct value from the saved configuration.

Example Code

from transformers import GPTNeoXConfig

class FixedGPTNeoXConfig(GPTNeoXConfig):
    def __init__(self, *args, **kwargs):
        rotary_pct = kwargs.pop("rotary_pct", None)
        if rotary_pct is not None:
            self.rope_parameters["partial_rotary_factor"] = rotary_pct
        else:
            self.rope_parameters.setdefault("partial_rotary_factor", 0.25)
        super().__init__(*args, **kwargs)

# Example usage:
config = FixedGPTNeoXConfig(rotary_pct=1.0)
print(config.rope_parameters["partial_rotary_factor"])  # 1.0
config.save_pretrained("/tmp/test")

config2 = FixedGPTNeoXConfig.from_pretrained("/tmp/test")
print(config2.rope_parameters["partial_rotary_factor"])  # 1.0

Verification

To verify that the fix worked, you can use the example code provided in the reproduction section:

config = FixedGPTNeoXConfig(rotary_pct=1.0)
print(config.rope_parameters["partial_rotary_factor"])  # 1.0
config.save_pretrained("/tmp/test")

config2 = FixedGPTNeoXConfig.from_pretrained("/tmp/test")
print(config2.rope_parameters["partial_rotary_factor"])  # 1.0

If the output is 1.0 for both config and config2, the fix is successful.

Extra Tips

  • Make sure to update the GPTNeoXConfig class to handle the rotary_pct attribute correctly.
  • If you are using a custom configuration class, ensure that it inherits from the updated GPTNeoXConfig class.
  • Always verify the fix by testing the example code provided in the reproduction section.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

partial_rotary_factor value should be retained

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix In GPTNeoXConfig, rotary_pct silently reverts to default on reload [2 pull requests, 3 comments, 3 participants]