transformers - 💡(How to fix) Fix Optimizer step being called 2 times when using deepspeed [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45656Fetched 2026-04-28 06:24:51
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
commented ×2closed ×1labeled ×1
RAW_BUFFERClick to expand / collapse

System Info

In version transformers==4.57.3, and deepspeed==0.18.3,

in below screenshot, when accelerator.backward is called, the deepspeed backward internally calls engine.step which is performing optimizer step at gradient accumulation step

The below snapshot is from trainer.py in transformers library

<img width="665" height="288" alt="Image" src="https://github.com/user-attachments/assets/a97f7a72-335a-494f-8b93-163aac363262" />

Also, inside trainer as well optimizer.step is called again post this backward at gradient accumulation step, attaching the below SS for reference.

<img width="611" height="62" alt="Image" src="https://github.com/user-attachments/assets/2b4c7b58-4df3-494e-b795-69d0faa611fd" />

So inherently in a single iteration it is doing two optimizer step which is wrong. Please update this bug.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

This bug is currently working as a feature

Expected behavior

There should pe single step, but there are inherently two steps for optimizer which is wrong

extent analysis

TL;DR

The issue can be fixed by modifying the trainer.py in the transformers library to remove the redundant optimizer.step call after accelerator.backward.

Guidance

  • Identify the line of code in trainer.py where optimizer.step is called after accelerator.backward and remove or comment it out to prevent the duplicate optimizer step.
  • Verify that the modification fixes the issue by checking the number of optimizer steps performed during a single iteration.
  • Review the code to ensure that the accelerator.backward call is correctly handling the gradient accumulation step without requiring an additional optimizer.step call.
  • Test the modified code with a simple example to confirm that the fix does not introduce any new issues.

Example

No code snippet is provided as the issue does not include the specific code lines that need to be modified.

Notes

The fix assumes that the accelerator.backward call is correctly implemented and that the removal of the redundant optimizer.step call will not affect the overall functionality of the code.

Recommendation

Apply workaround: Modify the trainer.py file to remove the redundant optimizer.step call, as this is a specific fix for the identified issue and does not require upgrading to a new version.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

There should pe single step, but there are inherently two steps for optimizer which is wrong

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - 💡(How to fix) Fix Optimizer step being called 2 times when using deepspeed [2 comments, 2 participants]