vllm - ✅(Solved) Fix [torch.compile] E2E correctness testing for fusions [1 pull requests, 1 comments, 2 participants]

vllm2026-04-09 16:05:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39428•Fetched 2026-04-10 03:40:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ProExpertProg

Participants

ihmdika

ProExpertProg

Timeline (top)

labeled ×2project_v2_item_status_changed ×2added_to_project_v2 ×1commented ×1

PR fix notes

PR #39480: [CI/Build][Model][DeepSeek] Add e2e correctness tests for compile fusions

Repository: vllm-project/vllm
Author: xikronz
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39480

Description (problem / solution / changelog)

Purpose

WIP for issue #39428 part 1: loading partial layers in deepseek

fixes --hf-overrides.num_hidden_layers to bypass KeyErrors in DeepSeek_v2 as a proof of concept during weight loading when layers are removed via hf_overrides. This is a prerequisite for running few-layer correctness tests on all models in tests/compile/fusions_e2e/models.py.
do the same for all three deepseek models in tests/compile/fusions_e2e/models.py.

Test plan

I wrote a quick smoketest script for loding 2 layers of deepseek-coder-v2 to see if the keyerror gets triggered for loading 2 layers of deepseek-ai/DeepSeek-Coder-V2-Lite-Base and RedHatAI/DeepSeek-Coder-V2-Lite-Instruct-FP8 which uses deepseek_v2.py and it works!

whereas if I switch back to main branch and run the same smoketest on the same models, we get the expected key error: <img width="1786" height="313" alt="Screenshot from 2026-04-16 00-23-20" src="https://github.com/user-attachments/assets/15f3d088-28b0-4616-bba9-bef2f1ebd761" />

<details> <summary>Essential Elements Checklist</summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details> <details> <summary>Smoke test script (test_deepseek_hf_override.py)</summary>

llm = LLM(
  model="deepseek-ai/DeepSeek-Coder-V2-Lite-Base",
  tensor_parallel_size=1,
  max_model_len=128,
  enforce_eager=True,
  hf_overrides={"num_hidden_layers": 2},
  trust_remote_code=True,
)
outputs = llm.generate(["Hello, my name is"], SamplingParams(max_tokens=10))
for output in outputs:
  print(f"Prompt: {output.prompt}")
  print(f"Output: {output.outputs[0].text}")

</details>

Changed files

vllm/model_executor/models/deepseek_v2.py (modified, +13/-5)

RAW_BUFFERClick to expand / collapse

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

extent analysis

TL;DR

Implementing a correctness testing approach by running a subset of model layers and comparing outputs to baselines may help ensure fusion configurations are correct.

Guidance

Investigate modifying the existing E2E tests to support running a limited number of layers for models, potentially by introducing a new test parameter to control the number of layers executed.
Develop a weight loading fix for models like DeepSeek when the number of hidden layers is overridden, to ensure consistent and accurate testing.
Compare the outputs of the modified model runs to both a baseline vLLM configuration and the Hugging Face baseline to verify correctness.
Consider the potential need for additional test infrastructure or utilities to support this new testing approach.

Notes

The implementation details of this approach are not fully specified and may require further investigation and experimentation to determine the best way to modify the existing tests and weight loading mechanisms.

Recommendation

Apply workaround: Implement a subset of the proposed correctness testing approach, starting with a simple model and a limited number of layers, to validate the feasibility of this method before expanding to more complex models and configurations.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [torch.compile] E2E correctness testing for fusions [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #39480: [CI/Build][Model][DeepSeek] Add e2e correctness tests for compile fusions

Description (problem / solution / changelog)

Purpose

Test plan

Changed files

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [torch.compile] E2E correctness testing for fusions [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #39480: [CI/Build][Model][DeepSeek] Add e2e correctness tests for compile fusions

Description (problem / solution / changelog)

Purpose

Test plan

Changed files

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING