transformers - ✅(Solved) Fix [Bug/Discusion] GLM-5 RoPE Implementation [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44485Fetched 2026-04-08 00:28:10
View on GitHub
Comments
3
Participants
3
Timeline
9
Reactions
0
Timeline (top)
commented ×3subscribed ×3cross-referenced ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #8085: [megatron] support GLM-5 megatron

Description (problem / solution / changelog)

  1. https://github.com/huggingface/transformers/issues/44360
  2. https://github.com/huggingface/transformers/issues/44261
  3. https://github.com/huggingface/transformers/issues/44485
  4. casual attention_mask in indexer

For precision alignment issues, please refer to these three issues.

Currently, the megatron-swift implementation uses qk_layernorm eps of 1e-5, adds relu in the indexer, and sets rope_interleave to true.

Environment Setup

pip install git+https://github.com/NVIDIA/Megatron-LM.git
pip install git+https://github.com/Dao-AILab/fast-hadamard-transform --no-build-isolation

Changed files

  • docs/source/Instruction/Supported-models-and-datasets.md (modified, +1/-1)
  • docs/source/Megatron-SWIFT/Command-line-parameters.md (modified, +5/-0)
  • docs/source/Megatron-SWIFT/Quick-start.md (modified, +1/-1)
  • docs/source_en/Instruction/Supported-models-and-datasets.md (modified, +1/-1)
  • docs/source_en/Megatron-SWIFT/Command-line-parameters.md (modified, +5/-0)
  • docs/source_en/Megatron-SWIFT/Quick-start.md (modified, +1/-1)
  • swift/megatron/arguments/megatron_args.py (modified, +4/-0)
  • swift/megatron/init.py (modified, +159/-0)
  • swift/megatron/model/gpt_bridge.py (modified, +29/-6)
  • swift/megatron/model/gpt_model.py (modified, +32/-4)
  • swift/megatron/model/gpts/__init__.py (modified, +1/-0)
  • swift/megatron/model/model_config.py (modified, +31/-3)
  • swift/megatron/model/register.py (modified, +17/-0)
  • swift/megatron/trainers/base.py (modified, +14/-0)
RAW_BUFFERClick to expand / collapse

System Info

https://huggingface.co/zai-org/GLM-5/blob/main/config.json#L45

hi! I see that the rope setting here is rope_interleave true.

However, looking at the transformers implementation, the logic here is false, refer to these

https://github.com/huggingface/transformers/blob/d5e555a632682555332c3c8e938461efd49d52b9/src/transformers/models/glm_moe_dsa/modular_glm_moe_dsa.py#L46-L74

https://github.com/huggingface/transformers/blob/d5e555a632682555332c3c8e938461efd49d52b9/src/transformers/models/deepseek_v3/modular_deepseek_v3.py#L47-L82

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Expected behavior

extent analysis

Fix Plan

Fix Name

Update rope_interleave setting to match transformers implementation

Steps

  1. Update config.json: Change the rope_interleave setting to false in the config.json file.
{
  ...
  "rope_interleave": false,
  ...
}
  1. Update code: Update the code to use the correct logic for rope_interleave. Since the issue doesn't provide specific code, assume it's in a Python file. Update the relevant function to use the correct logic.
import torch
from transformers import GLMForSequenceClassification

class MyModel(GLMForSequenceClassification):
    def __init__(self, config):
        super().__init__(config)
        self.rope_interleave = config.rope_interleave

    def forward(self, input_ids, attention_mask):
        # Correct logic for rope_interleave
        if not self.rope_interleave:
            # implementation for false
            pass
        else:
            # implementation for true
            pass
  1. Verify changes: Run the code with the updated config.json and verify that the changes have been applied correctly.

Verification

  • Run the code with the updated config.json and check that the rope_interleave setting is false.
  • Verify that the code is running without errors and producing the expected output.

Extra Tips

  • Make sure to update the config.json file in all relevant locations, including the examples folder.
  • If you're using a custom task or dataset, update the code to use the correct logic for rope_interleave in that specific context.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix [Bug/Discusion] GLM-5 RoPE Implementation [1 pull requests, 3 comments, 3 participants]