vllm - 💡(How to fix) Fix [Bug]: MFU statistics on the Step-3.5-Flash model are inaccurate [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38170Fetched 2026-04-08 01:31:58
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Timeline (top)
commented ×1labeled ×1mentioned ×1subscribed ×1

Root Cause

I ran performance tests on MinixMax-M2.5 and Step-3.5-Flash using vLLM v0.17.1 on a single machine equipped with 8 H20 GPUs. According to the statistics from enable-mfu-metrics, the average MFU for Step-3.5-Flash is 55.81%, while the average MFU for MiniMax-m2.5 is 32.61%. Because the gap is large, I tried to analyze the causes and found the following issues:

RAW_BUFFERClick to expand / collapse

Your current environment

Due to environmental constraints, I was unable to run the script. Here are the details of my testing environment:

GPU host: 8 H20 GPUs, each with 141 GB of memory vLLM version: v0.17.1

🐛 Describe the bug

I ran performance tests on MinixMax-M2.5 and Step-3.5-Flash using vLLM v0.17.1 on a single machine equipped with 8 H20 GPUs. According to the statistics from enable-mfu-metrics, the average MFU for Step-3.5-Flash is 55.81%, while the average MFU for MiniMax-m2.5 is 32.61%. Because the gap is large, I tried to analyze the causes and found the following issues:

  1. The current MFU statistics logic does not account for Sliding Window Attention (SWA)
  2. The current MFU statistics logic does not account for Multi-Token Prediction (MTP)

This causes the MFU reported for Step-3.5-Flash to be higher than the actual value (my guess).

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the issue with MFU statistics logic, we need to update the logic to account for Sliding Window Attention (SWA) and Multi-Token Prediction (MTP).

Steps to Fix:

  • Update the enable-mfu-metrics function to include SWA and MTP in the calculation
  • Modify the MFU statistics logic to correctly handle these attention mechanisms

Example Code:

def calculate_mfu(attn_weights, swa_weights=None, mtp_weights=None):
    # Calculate MFU without SWA and MTP
    mfu = calculate_mfu_without_swa_mtp(attn_weights)
    
    # Update MFU with SWA and MTP if available
    if swa_weights is not None:
        mfu = update_mfu_with_swa(mfu, swa_weights)
    if mtp_weights is not None:
        mfu = update_mfu_with_mtp(mfu, mtp_weights)
    
    return mfu

def update_mfu_with_swa(mfu, swa_weights):
    # Implement SWA update logic here
    # For example:
    swa_mfu = calculate_swa_mfu(swa_weights)
    return (mfu + swa_mfu) / 2

def update_mfu_with_mtp(mfu, mtp_weights):
    # Implement MTP update logic here
    # For example:
    mtp_mfu = calculate_mtp_mfu(mtp_weights)
    return (mfu + mtp_mfu) / 2

Verification

To verify the fix, run the performance tests on MinixMax-M2.5 and Step-3.5-Flash using the updated enable-mfu-metrics function and compare the average MFU values.

Extra Tips

  • Ensure that the updated logic is correctly handling edge cases and boundary conditions.
  • Consider adding additional logging or debugging statements to verify the correctness of the updated logic.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: MFU statistics on the Step-3.5-Flash model are inaccurate [1 comments, 2 participants]