PR fix notes

PR #45395: Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active

Repository: huggingface/transformers
Author: ArthurZucker
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/45395

Description (problem / solution / changelog)

Summary

Fixes #45137.

Since #41147, attention layers are decorated with @use_kernelized_func(apply_rotary_pos_emb) which attaches a rotary_fn child nn.Module at init when the kernels library is available.

DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to run during forward. The attention forward still calls the Python apply_rotary_pos_emb, so rotary_fn is never invoked and the parameter-fetch trace desynchronizes, raising:

IndexError: pop from an empty deque
  at deepspeed/runtime/zero/partitioned_param_coordinator.py

on the second forward (reproducible via TRL's RLOO/GRPO trainers under ZeRO-3, see huggingface/trl#4899).

Changed files

docs/source/en/model_doc/pp_chart2table.md (modified, +1/-1)
docs/source/en/model_doc/slanext.md (modified, +1/-1)
docs/source/en/model_doc/uvdoc.md (modified, +1/-1)
src/transformers/integrations/hub_kernels.py (modified, +9/-0)

PR #45414: Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active

Repository: huggingface/transformers
Author: ArthurZucker
State: closed | merged: True
Link: https://github.com/huggingface/transformers/pull/45414

Description (problem / solution / changelog)

Summary

Fixes #45137. Re-opened from #45395 on a same-repo branch so CI can run.

Since #41147, attention layers are decorated with @use_kernelized_func(apply_rotary_pos_emb) which attaches a rotary_fn child nn.Module at init when the kernels library is available. DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to fire during forward. The attention forward still calls the plain Python apply_rotary_pos_emb, so rotary_fn is never invoked and the parameter-fetch trace desynchronizes, raising:

IndexError: pop from an empty deque
  at deepspeed/runtime/zero/partitioned_param_coordinator.py

on the second forward (reproducible via TRL's RLOO/GRPO trainers under ZeRO-3, see huggingface/trl#4899).

Fix

Skip attaching the kernelized submodule when is_deepspeed_zero3_enabled() is true. Under ZeRO-3 the Python apply_rotary_pos_emb path is used (same behavior as before #41147). Non-ZeRO-3 users are unaffected.

The second commit refreshes dates on three model cards (pp_chart2table, slanext, uvdoc) that were missing them on main — required for check-repository-consistency to pass.

Test plan

Reproducer from huggingface/trl#4899 no longer raises IndexError: pop from an empty deque
Qwen3 forward + kernelize still replaces rotary_fn when not under ZeRO-3
make style + check-repository-consistency pass

Changed files

docs/source/en/model_doc/pp_chart2table.md (modified, +1/-1)
docs/source/en/model_doc/slanext.md (modified, +1/-1)
docs/source/en/model_doc/uvdoc.md (modified, +1/-1)
src/transformers/integrations/hub_kernels.py (modified, +9/-0)

PR #5541: Update tests with zero3 for RLOO and GRPO once fixed in transformers 5.5.4

Repository: huggingface/trl
Author: albertvillanova
State: closed | merged: True
Link: https://github.com/huggingface/trl/pull/5541

Description (problem / solution / changelog)

Update tests with zero3 for RLOO and GRPO once fixed in transformers 5.5.4.

This PR updates the test conditions for ZeRO-3 integration with the transformers library to reflect a recent upstream fix. The tests now only expect failures for a specific range of transformers versions where the issue is known to occur, improving the accuracy of test expectations.

Fix #4899, after the upstream issue in transformers:

https://github.com/huggingface/transformers/issues/45137

has been fixed by:

https://github.com/huggingface/transformers/pull/45414

Follow-up to:

#5420
#5404
#4898
#4899

Changes

Test condition updates:

In both test_reward and test_rloo in tests/distributed/test_distributed.py, the pytest.mark.xfail condition for the "zero3" parameter is updated to only expect failures when transformers version is greater than or equal to 5.0.0 and less than 5.5.4, reflecting that the issue is fixed in transformers#45414. The reason message is also updated for clarity.

[!NOTE] Low Risk Low risk: only adjusts pytest xfail version gating and messages in distributed tests, with no production code changes.

Overview Updates distributed tests so the zero3 parameter is only marked xfail for transformers versions >= 5.0.0 and < 5.5.4, reflecting that the upstream ZeRO-3 issue is fixed in transformers 5.5.4.

Also updates the associated xfail reason strings (and keeps strict=True) in test_rloo and test_grpo to document the fixed upstream PR/reference.

<sup>Reviewed by Cursor Bugbot for commit fef7620e6204dafedc6e16fb5c42f619bc7a135b. Bugbot is set up for automated code reviews on this repo. Configure here.</sup>

Changed files

tests/distributed/test_distributed.py (modified, +4/-4)

System Info

transformers version: 5.5.0.dev0
Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
Python version: 3.10.18
Huggingface_hub version: 1.8.0
Safetensors version: 0.6.2
Accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: 0.18.8
PyTorch version (accelerator?): 2.8.0+cu128 (CUDA)
Using distributed or parallel set-up in script?: <fill in>
Using GPU in script?: <fill in>
GPU type: NVIDIA H100 80GB HBM3

Who can help?

kernels: @MekkCyber @drbh

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

See related downstream issue in trl, for a full reproducer using TRL's RLOO and GRPO trainers:

https://github.com/huggingface/trl/issues/4899

Expected behavior

It should raise no error.

extent analysis

TL;DR

The issue may be resolved by checking and adjusting the compatibility of the transformers library version with other dependencies, particularly PyTorch and DeepSpeed.

Guidance

Review the versions of PyTorch, DeepSpeed, and transformers to ensure they are compatible, as the transformers version is a development version (5.5.0.dev0).
Check the official documentation for the transformers library to see if there are any known issues or compatibility problems with the current version.
Investigate the related downstream issue in trl (https://github.com/huggingface/trl/issues/4899) for potential clues or workarounds.
Consider testing with a stable version of the transformers library to isolate the issue.

Notes

The provided information lacks details about the specific task or dataset being used, which might be relevant for troubleshooting. Additionally, the transformers version is a development version, which could be a contributing factor to the issue.

Recommendation

Apply workaround: Given the development version of transformers and the lack of information about the task or dataset, it's recommended to try a stable version of transformers or investigate the compatibility with other dependencies as a workaround.

transformers - ✅(Solved) Fix IndexError: pop from an empty deque with DeepSpeed ZeRO3 [3 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #45395: Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active

Description (problem / solution / changelog)

Summary

Changed files

PR #45414: Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active

Description (problem / solution / changelog)

Summary

Fix

Test plan

Changed files

PR #5541: Update tests with zero3 for RLOO and GRPO once fixed in transformers 5.5.4

Description (problem / solution / changelog)

Changes

Changed files

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING