vllm - 💡(How to fix) Fix [Bug]: MFU statistics on the Step-3.5-Flash model are inaccurate [1 comments, 2 participants]

vllm2026-03-26 01:15:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38170•Fetched 2026-04-08 01:31:58

View on GitHub

Comments

Participants

Timeline

Reactions

Author

kangxiaoning

Participants

kangxiaoning

Saad-Mallebhari

Timeline (top)

commented ×1labeled ×1mentioned ×1subscribed ×1

Root Cause

I ran performance tests on MinixMax-M2.5 and Step-3.5-Flash using vLLM v0.17.1 on a single machine equipped with 8 H20 GPUs. According to the statistics from enable-mfu-metrics, the average MFU for Step-3.5-Flash is 55.81%, while the average MFU for MiniMax-m2.5 is 32.61%. Because the gap is large, I tried to analyze the causes and found the following issues:

RAW_BUFFERClick to expand / collapse

Your current environment

Due to environmental constraints, I was unable to run the script. Here are the details of my testing environment:

GPU host: 8 H20 GPUs, each with 141 GB of memory vLLM version: v0.17.1

🐛 Describe the bug

The current MFU statistics logic does not account for Sliding Window Attention (SWA)
The current MFU statistics logic does not account for Multi-Token Prediction (MTP)

This causes the MFU reported for Step-3.5-Flash to be higher than the actual value (my guess).

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the issue with MFU statistics logic, we need to update the logic to account for Sliding Window Attention (SWA) and Multi-Token Prediction (MTP).

Steps to Fix:

Update the enable-mfu-metrics function to include SWA and MTP in the calculation
Modify the MFU statistics logic to correctly handle these attention mechanisms

Example Code:

def calculate_mfu(attn_weights, swa_weights=None, mtp_weights=None):
    # Calculate MFU without SWA and MTP
    mfu = calculate_mfu_without_swa_mtp(attn_weights)
    
    # Update MFU with SWA and MTP if available
    if swa_weights is not None:
        mfu = update_mfu_with_swa(mfu, swa_weights)
    if mtp_weights is not None:
        mfu = update_mfu_with_mtp(mfu, mtp_weights)
    
    return mfu

def update_mfu_with_swa(mfu, swa_weights):
    # Implement SWA update logic here
    # For example:
    swa_mfu = calculate_swa_mfu(swa_weights)
    return (mfu + swa_mfu) / 2

def update_mfu_with_mtp(mfu, mtp_weights):
    # Implement MTP update logic here
    # For example:
    mtp_mfu = calculate_mtp_mfu(mtp_weights)
    return (mfu + mtp_mfu) / 2

Verification

To verify the fix, run the performance tests on MinixMax-M2.5 and Step-3.5-Flash using the updated enable-mfu-metrics function and compare the average MFU values.

Extra Tips

Ensure that the updated logic is correctly handling edge cases and boundary conditions.
Consider adding additional logging or debugging statements to verify the correctness of the updated logic.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#generation error #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: MFU statistics on the Step-3.5-Flash model are inaccurate [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Steps to Fix:

Example Code:

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: MFU statistics on the Step-3.5-Flash model are inaccurate [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Steps to Fix:

Example Code:

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING