ollama - 💡(How to fix) Fix [Feature Request] Support weightless RMSNorm (for FlashNorm weight folding trick) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15913Fetched 2026-05-02 05:27:47
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1
RAW_BUFFERClick to expand / collapse

Please add support for RMSNorm without normalization weights.

This is to support FlashNorm — a mathematically equivalent variant of RMSNorm that folds norm weights into the subsequent linear layer. See explainer video.

We have applied this weight folding trick to a few LLMs (Llama, Qwen, SMolLM) here: https://huggingface.co/models?other=weightless-rmsnorm

<img width="332" height="183" alt="Image" src="https://github.com/user-attachments/assets/d97b50ba-1092-4d44-ad70-ff2bca448b1d" />

Motivation

FlashNorm's removal of norm weights reduces inference overhead at zero accuracy cost, and we'd like to share these optimized models with the broader community.

Possible Implementation

Remove norm weights from your RMSNorm implementation. E.g., just skip norm weight multiplication if there are no norm weights provided.

extent analysis

TL;DR

Modify the RMSNorm implementation to support the FlashNorm variant by removing norm weights.

Guidance

  • Review the RMSNorm implementation and identify where norm weights are being used.
  • Consider adding a conditional check to skip norm weight multiplication when no norm weights are provided.
  • Study the example models (Llama, Qwen, SMolLM) on Hugging Face to understand how the weight folding trick is applied.
  • Evaluate the potential performance benefits of using FlashNorm in your specific use case.

Example

# Pseudocode example of modified RMSNorm implementation
if norm_weights is None:
    # Skip norm weight multiplication
    normalized_input = input / rms_norm
else:
    # Original RMSNorm implementation with norm weights
    normalized_input = input / (rms_norm * norm_weights)

Notes

The implementation details may vary depending on the specific RMSNorm implementation being used. It's essential to carefully review and test the modified implementation to ensure correctness and performance.

Recommendation

Apply workaround: Modify the RMSNorm implementation to support FlashNorm by removing norm weights, as this change can reduce inference overhead without affecting accuracy.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix [Feature Request] Support weightless RMSNorm (for FlashNorm weight folding trick) [1 participants]