transformers - ✅(Solved) Fix [Bug/Discusion] GLM-5 RoPE Implementation [1 pull requests, 3 comments, 3 participants]

Jintao-Huang · 2026-03-06T06:04:18Z

[transformers] PR 8085: megatron support GLM-5 megatron - Repository: modelscope/ms-swift - Author: Jintao-Huang - State: closed | merged: True - Link: https:/… # PR #8085: [megatron] support GLM-5 megatron - Repository: modelscope/ms-swift - Author: Jintao-Huang - State: closed | merged: True - Link: https://github.com/modelscope/ms-swift/pull/8085 ## Description (problem / solution / changelog) 1. https://github.com/huggingface/transformers/issues/44360 2. https://github.com/huggingface/transformers/issues/44261 3. https://github.com/huggingface/transformers/issues/44485 4. casual attention_mask in indexer For precision alignment issues, please refer to these three issues. Currently, the megatron-swift implementation uses qk_layernorm eps of 1e-5, adds relu in the indexer, and sets rope_interleave to true. ## Environment Setup ```shell pip install git+https://github.com/NVIDIA/Megatron-LM.git pip install git+https://github.com/Dao-AILab/fast-hadamard-transform --no-build-isolation ``` ## Changed files - `docs/source/Instruction/Supported-models-and-datasets.md` (modified, +1/-1) - `docs/source/Megatron-SWIFT/Command-line-parameters.md` (modified, +5/-0) - `docs/source/Megatron-SWIFT/Quick-start.md` (modified, +1/-1) - `docs/source_en/Instruction/Supported-models-and-datasets.md` (modified, +1/-1) - `docs/source_en/Megatron-SWIFT/Command-line-parameters.md` (modified, +5/-0) - `docs/source_en/Megatron-SWIFT/Quick-start.md` (modified, +1/-1) - `swift/megatron/arguments/megatron_args.py` (modified, +4/-0) - `swift/megatron/init.py` (modified, +159/-0) - `swift/megatron/model/gpt_bridge.py` (modified, +29/-6) - `swift/megatron/model/gpt_model.py` (modified, +32/-4) - `swift/megatron/model/gpts/__init__.py` (modified, +1/-0) - `swift/megatron/model/model_config.py` (modified, +31/-3) - `swift/megatron/model/register.py` (modified, +17/-0) - `swift/megatron/trainers/base.py` (modified, +14/-0) ## Fixed - Fixed by PR: [megatron] support GLM-5 megatron (https://github.com/modelscope/ms-swift/pull/8085) ### System Info https://huggingface.co/zai-org/GLM-5/blob/main/config.json#L45 hi! I see that the rope setting here is rope_interleave true. However, looking at the transformers implementation, the logic here is false, refer to these https://github.com/huggingface/transformers/blob/d5e555a632682555332c3c8e938461efd49d52b9/src/transformers/models/glm_moe_dsa/modular_glm_moe_dsa.py#L46-L74 https://github.com/huggingface/transformers/blob/d5e555a632682555332c3c8e938461efd49d52b9/src/transformers/models/deepseek_v3/modular_deepseek_v3.py#L47-L82 ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction - ### Expected behavior -

transformers2026-03-06 06:04:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44485•Fetched 2026-04-08 00:28:10

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3subscribed ×3cross-referenced ×1labeled ×1

Fix Action

Fixed

Fixed by PR: [megatron] support GLM-5 megatron (https://github.com/modelscope/ms-swift/pull/8085)

PR fix notes

PR #8085: [megatron] support GLM-5 megatron

Repository: modelscope/ms-swift
Author: Jintao-Huang
State: closed | merged: True
Link: https://github.com/modelscope/ms-swift/pull/8085

Description (problem / solution / changelog)

https://github.com/huggingface/transformers/issues/44360
https://github.com/huggingface/transformers/issues/44261
https://github.com/huggingface/transformers/issues/44485
casual attention_mask in indexer

For precision alignment issues, please refer to these three issues.

Currently, the megatron-swift implementation uses qk_layernorm eps of 1e-5, adds relu in the indexer, and sets rope_interleave to true.

Environment Setup

pip install git+https://github.com/NVIDIA/Megatron-LM.git
pip install git+https://github.com/Dao-AILab/fast-hadamard-transform --no-build-isolation

Changed files

docs/source/Instruction/Supported-models-and-datasets.md (modified, +1/-1)
docs/source/Megatron-SWIFT/Command-line-parameters.md (modified, +5/-0)
docs/source/Megatron-SWIFT/Quick-start.md (modified, +1/-1)
docs/source_en/Instruction/Supported-models-and-datasets.md (modified, +1/-1)
docs/source_en/Megatron-SWIFT/Command-line-parameters.md (modified, +5/-0)
docs/source_en/Megatron-SWIFT/Quick-start.md (modified, +1/-1)
swift/megatron/arguments/megatron_args.py (modified, +4/-0)
swift/megatron/init.py (modified, +159/-0)
swift/megatron/model/gpt_bridge.py (modified, +29/-6)
swift/megatron/model/gpt_model.py (modified, +32/-4)
swift/megatron/model/gpts/__init__.py (modified, +1/-0)
swift/megatron/model/model_config.py (modified, +31/-3)
swift/megatron/model/register.py (modified, +17/-0)
swift/megatron/trainers/base.py (modified, +14/-0)

RAW_BUFFERClick to expand / collapse

System Info

https://huggingface.co/zai-org/GLM-5/blob/main/config.json#L45

hi! I see that the rope setting here is rope_interleave true.

However, looking at the transformers implementation, the logic here is false, refer to these

https://github.com/huggingface/transformers/blob/d5e555a632682555332c3c8e938461efd49d52b9/src/transformers/models/glm_moe_dsa/modular_glm_moe_dsa.py#L46-L74

https://github.com/huggingface/transformers/blob/d5e555a632682555332c3c8e938461efd49d52b9/src/transformers/models/deepseek_v3/modular_deepseek_v3.py#L47-L82

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Expected behavior

extent analysis

Fix Plan

Fix Name

Update rope_interleave setting to match transformers implementation

Steps

Update config.json: Change the rope_interleave setting to false in the config.json file.

{
  ...
  "rope_interleave": false,
  ...
}

Update code: Update the code to use the correct logic for rope_interleave. Since the issue doesn't provide specific code, assume it's in a Python file. Update the relevant function to use the correct logic.

import torch
from transformers import GLMForSequenceClassification

class MyModel(GLMForSequenceClassification):
    def __init__(self, config):
        super().__init__(config)
        self.rope_interleave = config.rope_interleave

    def forward(self, input_ids, attention_mask):
        # Correct logic for rope_interleave
        if not self.rope_interleave:
            # implementation for false
            pass
        else:
            # implementation for true
            pass

Verify changes: Run the code with the updated config.json and verify that the changes have been applied correctly.

Verification

Run the code with the updated config.json and check that the rope_interleave setting is false.
Verify that the code is running without errors and producing the expected output.

Extra Tips

Make sure to update the config.json file in all relevant locations, including the examples folder.
If you're using a custom task or dataset, update the code to use the correct logic for rope_interleave in that specific context.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #ssr #installation #tensor shape #autograd error #tool integration #LLM response #prompt template #agent execution #callback error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix [Bug/Discusion] GLM-5 RoPE Implementation [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #8085: [megatron] support GLM-5 megatron

Description (problem / solution / changelog)

Environment Setup

Changed files

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Fix Name

Steps

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix [Bug/Discusion] GLM-5 RoPE Implementation [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #8085: [megatron] support GLM-5 megatron

Description (problem / solution / changelog)

Environment Setup

Changed files

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Fix Name

Steps

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING