vllm - 💡(How to fix) Fix [Performance]: DeepSeek-V3.2 performance on 8xH20 is not match with official data [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40592Fetched 2026-04-23 07:24:06
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Root Cause

My device is 8xH20(96G), I use the scripts from https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2.html#benchmarking but it's can't running because of OOM.

RAW_BUFFERClick to expand / collapse

my env

vllm0.19.1, Deep_gemm2.1.1+local,8xH20(96G)

My device is 8xH20(96G), I use the scripts from https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2.html#benchmarking but it's can't running because of OOM.

It's the official performance tested on 8xH200?https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2.html#tp8-benchmark-output

<img width="1302" height="1667" alt="Image" src="https://github.com/user-attachments/assets/ad3162c8-eb29-4dd6-a808-bdafc0abc0bb" />

extent analysis

TL;DR

The most likely fix is to adjust the model or batch size to reduce memory usage and prevent Out-of-Memory (OOM) errors on the 8xH20 devices.

Guidance

  • Review the benchmarking scripts from the provided URL to identify potential memory-intensive operations or parameters that can be adjusted.
  • Consider reducing the batch size or model complexity to decrease memory usage, as the official performance tests may have been conducted on more capable hardware (8xH200).
  • Verify the device's memory usage and available resources before and after adjustments to ensure the changes are effective.
  • Check the documentation for any specific guidelines on running the scripts on 8xH20 devices, as there may be known limitations or workarounds.

Notes

The provided information lacks specific details about the scripts and model configurations, so the suggested adjustments are general and may require further experimentation to find the optimal solution.

Recommendation

Apply workaround: Adjust the model or batch size to reduce memory usage, as this is a more feasible and immediate solution given the potential hardware limitations.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Performance]: DeepSeek-V3.2 performance on 8xH20 is not match with official data [1 participants]