vllm - ✅(Solved) Fix [torch.compile] E2E correctness testing for fusions [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39428Fetched 2026-04-10 03:40:41
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×2project_v2_item_status_changed ×2added_to_project_v2 ×1commented ×1

PR fix notes

PR #39480: [CI/Build][Model][DeepSeek] Add e2e correctness tests for compile fusions

Description (problem / solution / changelog)

Purpose

WIP for issue #39428 part 1: loading partial layers in deepseek

  • fixes --hf-overrides.num_hidden_layers to bypass KeyErrors in DeepSeek_v2 as a proof of concept during weight loading when layers are removed via hf_overrides. This is a prerequisite for running few-layer correctness tests on all models in tests/compile/fusions_e2e/models.py.
  • do the same for all three deepseek models in tests/compile/fusions_e2e/models.py.

Test plan

I wrote a quick smoketest script for loding 2 layers of deepseek-coder-v2 to see if the keyerror gets triggered for loading 2 layers of deepseek-ai/DeepSeek-Coder-V2-Lite-Base and RedHatAI/DeepSeek-Coder-V2-Lite-Instruct-FP8 which uses deepseek_v2.py and it works!

<img width="1791" height="313" alt="image" src="https://github.com/user-attachments/assets/7694b8a6-625b-4ec5-9de2-6fbdf581611e" />

whereas if I switch back to main branch and run the same smoketest on the same models, we get the expected key error: <img width="1786" height="313" alt="Screenshot from 2026-04-16 00-23-20" src="https://github.com/user-attachments/assets/15f3d088-28b0-4616-bba9-bef2f1ebd761" />

<details> <summary>Essential Elements Checklist</summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details> <details> <summary>Smoke test script (test_deepseek_hf_override.py)</summary>
llm = LLM(
  model="deepseek-ai/DeepSeek-Coder-V2-Lite-Base",
  tensor_parallel_size=1,
  max_model_len=128,
  enforce_eager=True,
  hf_overrides={"num_hidden_layers": 2},
  trust_remote_code=True,
)
outputs = llm.generate(["Hello, my name is"], SamplingParams(max_tokens=10))
for output in outputs:
  print(f"Prompt: {output.prompt}")
  print(f"Output: {output.outputs[0].text}")
</details>

Changed files

  • vllm/model_executor/models/deepseek_v2.py (modified, +13/-5)
RAW_BUFFERClick to expand / collapse

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

extent analysis

TL;DR

Implementing a correctness testing approach by running a subset of model layers and comparing outputs to baselines may help ensure fusion configurations are correct.

Guidance

  • Investigate modifying the existing E2E tests to support running a limited number of layers for models, potentially by introducing a new test parameter to control the number of layers executed.
  • Develop a weight loading fix for models like DeepSeek when the number of hidden layers is overridden, to ensure consistent and accurate testing.
  • Compare the outputs of the modified model runs to both a baseline vLLM configuration and the Hugging Face baseline to verify correctness.
  • Consider the potential need for additional test infrastructure or utilities to support this new testing approach.

Notes

The implementation details of this approach are not fully specified and may require further investigation and experimentation to determine the best way to modify the existing tests and weight loading mechanisms.

Recommendation

Apply workaround: Implement a subset of the proposed correctness testing approach, starting with a simple model and a limited number of layers, to validate the feasibility of this method before expanding to more complex models and configurations.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [torch.compile] E2E correctness testing for fusions [1 pull requests, 1 comments, 2 participants]