pytorch - ✅(Solved) Fix Muon documentation lacks minimal example [1 pull requests, 2 comments, 2 participants]

pytorch2026-03-10 14:49:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#177029•Fetched 2026-04-08 00:22:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

wendlerc

Participants

Echen1246

wendlerc

Timeline (top)

labeled ×4referenced ×4mentioned ×3subscribed ×3

Fix Action

Fixed

Fixed by PR: Add minimal usage example to Muon optimizer docstring (#177029) (https://github.com/pytorch/pytorch/pull/177262)

PR fix notes

PR #177262: Add minimal usage example to Muon optimizer docstring (#177029)

Repository: pytorch/pytorch
Author: Echen1246
State: open | merged: False
Link: https://github.com/pytorch/pytorch/pull/177262

Description (problem / solution / changelog)

Fixes #177029 Adds an Example: section to the torch.optim.Muon docstring showing how to split 2D parameters (for Muon) from biases/embeddings (for AdamW), matching the pattern from the external Muon repo's MuonWithAuxAdam but using native PyTorch optimizers.

Changed files

torch/optim/_muon.py (modified, +23/-0)

Code Example

# optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4, betas=(0.90, 0.95), weight_decay=0.01)

# To replace the above, do the following:

from muon import MuonWithAuxAdam
hidden_weights = [p for p in model.body.parameters() if p.ndim >= 2]
hidden_gains_biases = [p for p in model.body.parameters() if p.ndim < 2]
nonhidden_params = [*model.head.parameters(), *model.embed.parameters()]
param_groups = [
    dict(params=hidden_weights, use_muon=True,
         lr=0.02, weight_decay=0.01),
    dict(params=hidden_gains_biases+nonhidden_params, use_muon=False,
         lr=3e-4, betas=(0.9, 0.95), weight_decay=0.01),
]
optimizer = MuonWithAuxAdam(param_groups)

RAW_BUFFERClick to expand / collapse

Hi,

I was trying to switch from https://github.com/KellerJordan/Muon to https://docs.pytorch.org/docs/stable/generated/torch.optim.Muon.html and could not help but notice that the torch doc is lacking a minimal example like:

# optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4, betas=(0.90, 0.95), weight_decay=0.01)

# To replace the above, do the following:

from muon import MuonWithAuxAdam
hidden_weights = [p for p in model.body.parameters() if p.ndim >= 2]
hidden_gains_biases = [p for p in model.body.parameters() if p.ndim < 2]
nonhidden_params = [*model.head.parameters(), *model.embed.parameters()]
param_groups = [
    dict(params=hidden_weights, use_muon=True,
         lr=0.02, weight_decay=0.01),
    dict(params=hidden_gains_biases+nonhidden_params, use_muon=False,
         lr=3e-4, betas=(0.9, 0.95), weight_decay=0.01),
]
optimizer = MuonWithAuxAdam(param_groups)

Best, Chris

cc @svekars @sekyondaMeta @AlannaBurke

extent analysis

Problem Summary

Switching from Muon optimizer to PyTorch's MuonWithAuxAdam optimizer.

Root Cause Analysis

The issue is due to the lack of a minimal example in the PyTorch documentation for using MuonWithAuxAdam optimizer.

Fix Plan

To fix this issue, we need to create a custom optimizer using MuonWithAuxAdam. Here are the steps:

Step 1: Import necessary modules

from muon import MuonWithAuxAdam
import torch

Step 2: Separate model parameters into different groups

hidden_weights = [p for p in model.body.parameters() if p.ndim >= 2]
hidden_gains_biases = [p for p in model.body.parameters() if p.ndim < 2]
nonhidden_params = [*model.head.parameters(), *model.embed.parameters()]

Step 3: Create parameter groups for the optimizer

param_groups = [
    dict(params=hidden_weights, use_muon=True,
         lr=0.02, weight_decay=0.01),
    dict(params=hidden_gains_biases+nonhidden_params, use_muon=False,
         lr=3e-4, betas=(0.9, 0.95), weight_decay=0.01),
]

Step 4: Create the optimizer

optimizer = MuonWithAuxAdam(param_groups)

Verification

To verify that the fix worked, you can check if the optimizer is created correctly and if the model is being updated correctly during training.

Extra Tips

Make sure to update the PyTorch documentation with a minimal example for using MuonWithAuxAdam optimizer.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #GPU compatibility #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix Muon documentation lacks minimal example [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #177262: Add minimal usage example to Muon optimizer docstring (#177029)

Description (problem / solution / changelog)

Changed files

Code Example

extent analysis

Problem Summary

Root Cause Analysis

Fix Plan

Step 1: Import necessary modules

Step 2: Separate model parameters into different groups

Step 3: Create parameter groups for the optimizer

Step 4: Create the optimizer

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix Muon documentation lacks minimal example [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #177262: Add minimal usage example to Muon optimizer docstring (#177029)

Description (problem / solution / changelog)

Changed files

Code Example

extent analysis

Problem Summary

Root Cause Analysis

Fix Plan

Step 1: Import necessary modules

Step 2: Separate model parameters into different groups

Step 3: Create parameter groups for the optimizer

Step 4: Create the optimizer

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING