transformers - ✅(Solved) Fix inverse_sqrt scheduler ignores lr_scheduler_kwargs (timescale not passed) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44908Fetched 2026-04-08 01:07:52
View on GitHub
Comments
1
Participants
2
Timeline
11
Reactions
0
Timeline (top)
referenced ×6cross-referenced ×2closed ×1commented ×1

Root Cause

Trigger a learning job with "inverse_sqrt" and set timesteps = 'random', it still works because the arguments are never passed.

Fix Action

Fixed

PR fix notes

PR #44909: Fix: Update optimization.py

Description (problem / solution / changelog)

The get_scheduler function was identifying the inverse_sqrt scheduler type but failing to pass **scheduler_specific_kwargs to the underlying get_inverse_sqrt_schedule function.

This caused user-defined parameters like timescale to be silently ignored. This commit adds the missing kwargs to the function call at line 664.

Fixes #44908

Changed files

  • src/transformers/optimization.py (modified, +1/-1)
RAW_BUFFERClick to expand / collapse

System Info

Incomplete arguments passed for schedulers where name is explicitly checked.

https://github.com/huggingface/transformers/blob/v5.3.0/src/transformers/optimization.py#L664

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Trigger a learning job with "inverse_sqrt" and set timesteps = 'random', it still works because the arguments are never passed.

Expected behavior

timesteps should take effect.

extent analysis

Fix Plan

The fix involves modifying the get_scheduler function in optimization.py to properly handle the timesteps argument when the scheduler name is explicitly checked.

Code Changes

To fix the issue, update the get_scheduler function as follows:

def get_scheduler(
    name: str,
    optimizer: torch.optim.Optimizer,
    num_warmup_steps: Optional[int] = None,
    num_training_steps: Optional[int] = None,
    num_cycles: Optional[int] = None,
    last_epoch: int = -1,
    timesteps: Optional[str] = None,  # Add timesteps argument
):
    # ... (rest of the function remains the same)

    if name == "inverse_sqrt":
        if timesteps == "random":
            # Implement random timesteps logic here
            num_training_steps = random.randint(1, 1000)  # Example random timesteps
        elif timesteps is not None:
            # Implement custom timesteps logic here
            num_training_steps = int(timesteps)
        # ... (rest of the function remains the same)

Verification

To verify the fix, trigger a learning job with "inverse_sqrt" and set timesteps to "random" or a custom value. The timesteps argument should now take effect, and the scheduler should behave accordingly.

Extra Tips

  • Make sure to update the get_scheduler function in the correct file (optimization.py) and commit the changes.
  • Test the fix with different timesteps values to ensure it works as expected.
  • Consider adding additional error handling and logging to ensure the timesteps argument is properly validated and handled.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

timesteps should take effect.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix inverse_sqrt scheduler ignores lr_scheduler_kwargs (timescale not passed) [1 pull requests, 1 comments, 2 participants]