vllm - ✅(Solved) Fix [RFC]: maybe add PR deduplication CI workflow ? [45 pull requests, 6 comments, 5 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39694Fetched 2026-04-14 05:38:03
View on GitHub
Comments
6
Participants
5
Timeline
26
Reactions
5
Timeline (top)
mentioned ×9subscribed ×9commented ×6cross-referenced ×1

Error Message

  1. comment(warn) the potential duplication of similarity score is high

Fix Action

Fixed

PR fix notes

PR #39695: Introduce De-dup CI Workflow for PR/Issue

Description (problem / solution / changelog)

Purpose

Example to explain https://github.com/vllm-project/vllm/issues/39694

  • Scoring: 0.75 * text_similarity + 0.25 * file_overlap .
  • Threshold used for report: 0.75 .

Test Plan

Using 1000 recent PR to test the similarity check :

High-similarity pairs ( >=0.75 ): 26

Test Result

PR Similarity Report

  • Repo: vllm-project/vllm
  • PR count: 1000
  • Candidate pairs: 17375
  • High-similarity pairs (>= 0.75): 26
ScoreTextFilesPR APR B
100%100%100%#39553 Okakarpa shadow clone#39577 Okakarpa shadow clone
99%99%100%#37929 [Core] Use standalone autograd_cache_key for compilation dedup optimization#39517 [Core] Use standalone autograd_cache_key for compilation dedup optimization
96%95%100%#37947 [DRAFT][XPU] Upgrade torch 2.11 for xpu#39257 [XPU] update triton version for torch 2.11 upgrade
96%95%100%#37947 [DRAFT][XPU] Upgrade torch 2.11 for xpu#39313 [XPU] upgrade to triton-xpu 3.7.0
95%97%88%#38249 [Misc] Organize NixlConnector into own directory#39354 [KVConnector][NIXL] Organize NIXL connector into its own directory
95%93%100%#39410 [XPU] Disable fusion passes on XPU Platform#39671 use spawn multiproc method on xpu
94%92%100%#38856 [LMCache] vLLM Block Allocation Event#39719 fix(lmcache): correct store for cached requests while enable prefix cache
94%91%100%#39606 Pass extra_config to the constructor of LMCacheMPXXXAdapter#39719 fix(lmcache): correct store for cached requests while enable prefix cache
94%91%100%#39257 [XPU] update triton version for torch 2.11 upgrade#39313 [XPU] upgrade to triton-xpu 3.7.0
91%100%67%#39432 Gfx1250 wip#39437 Gfx1250 wip rebase test
90%92%85%#36823 [vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#38775 [vLLM IR] 4/N Compile native implementation
90%86%100%#39402 [kv_offload+HMA[10/N]: Support load with multiple KV groups#39403 [kv_offload+HMA][11/N]: Support store with multiple KV groups
86%98%50%#23995 Feature/deepseek v31 lora support#39661 [DOC] Update Gemma 4
82%76%100%#39110 [Core] Disable HMA for eagle/MTP with sliding window models#39376 [Core] Disable HMA for eagle/MTP with sliding window models
82%76%100%#39401 [kv_offload+HMA][9/N]: Support lookup with multiple KV groups#39402 [kv_offload+HMA[10/N]: Support load with multiple KV groups
82%76%100%#39401 [kv_offload+HMA][9/N]: Support lookup with multiple KV groups#39403 [kv_offload+HMA][11/N]: Support store with multiple KV groups
80%96%33%#26583 add log for request trace#39646 V0.12.0 support n sampling delay split to eliminate redundant prefill computation and memory
79%97%22%#35721 [LoRA] Support dual CUDA streams-Linear Layer#37297 [LoRA] Support FP8 LoRA E2E inference-dense model
79%94%32%#39153 [Frontend][4/n] Improve pooling entrypointspooling.
79%74%91%#38775 [vLLM IR] 4/N Compile native implementation#39453 Port activations to IR op 1/3
79%88%50%#39312 [Mergify] Update model vendor auto-label rules#39429 [CI/Build] Update auto-rebase rule
78%100%13%#39723 [SimpleCPUOffloadConnector]: Add support for reset_cache()#39726 [SimpleCPUOffloadConnector]: Add support for reset_cache()
77%98%14%#38780 [vLLM IR][RMSNorm] Port GemmaRMSNorm to vLLM IR Ops#38798 [vLLM IR][RMSNorm] Port RMSNormGated to vLLM IR Ops
77%69%100%#39744 [v1] Expose num_prompt_tokens in CommonAttentionMetadata#39745 [v1] Expose num_prompt_tokens in CommonAttentionMetadata
77%81%62%#23133 Split compressed_tensors_moe.py into separate wna16, int8, fp8, nvfp4#29427 [Refactor] Split up compressed_tensors_moe.py into separate files per method
76%82%59%#39267 [vllm IR] 1/N Port FP8 Quantization to vLLM IR Ops#39481 [vllm IR] Port FP8 Quantization to vLLM IR Ops

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • .github/workflows/detect-duplicate-issues.yml (added, +29/-0)
  • .github/workflows/detect-duplicate-prs.yml (added, +30/-0)
  • .github/workflows/scripts/detect_duplicate_issues.py (added, +149/-0)
  • .github/workflows/scripts/detect_duplicate_prs.py (added, +249/-0)

Code Example

ladder :

Member -> Triager ->  Reviewer -> Approver --> Maintainer / Committer --> Tech Lead  / Steering Committee
RAW_BUFFERClick to expand / collapse

Motivation.

With more and more AI-assisted or even vibe coding , PR creation speed now outpaces human review capacity. This has led to a huge growing accumulation of open PRs, reviewer burden has become overwhelmed and PR comes more....

To break this vicious cycle , Maybe( sorry to jump in, I believe committee already in discussion ) we can push on a few fronts in parallel:

  1. add duplicate-detection CI workflow (e.g., text and changed-file similarity) to reduce redundant PRs; [Using LLM may be cost too much $$ .... so I choose text similarity for now ... ]
  2. same de-dup applied to issues ? ( but I' not sure plain similarity can work....)
  3. like some other open source community, introduce contributors ladder , to motivate them to participate in triage and review, expanding review capacity beyond a small core group. For example: we already have AI-assisted reviewer, but sometimes need a committer "+1" to trigger it. those new reviewer can do that instead.
ladder :

Member -> Triager ->  Reviewer -> Approver --> Maintainer / Committer --> Tech Lead  / Steering Committee

Proposed Change.

add some quick check for PR duplication check

  1. getting content of recent 100 or so PR list
  2. compare the historical PRs with current PR by text-similarity
  3. comment(warn) the potential duplication of similarity score is high

https://github.com/vllm-project/vllm/pull/39695 as a quick example (Throw a brick to attract jade... )

For example

If similar PR detected , a "de-dup" warning will pop up and adding the linkage.

<img width="1868" height="1188" alt="Image" src="https://github.com/user-attachments/assets/4277d266-39d1-4743-9d12-391cc27bcb99" />

which just use 1.5 mins to complete , just will use a lot of Github API budget...

<img width="2992" height="1144" alt="Image" src="https://github.com/user-attachments/assets/ab22e81a-4915-4334-a0a3-334d2a1decdc" />

apology

I'm sorry, below contributors, when I was doing tests in my own repo, script wrongly cherry-pick your PRs and adding references. Excuse me to bring any confusion! @tlrmchlsmth @markmc @gcanlin @Harry-Chen @fxmarty-amd @chaunceyjiang @KimuGenie @yma11

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Implement a duplicate-detection CI workflow using text similarity to reduce redundant PRs and alleviate reviewer burden.

Guidance

  • Consider using a CI workflow that checks for duplicate PRs based on text similarity to reduce the number of redundant PRs.
  • Introduce a contributors ladder to motivate contributors to participate in triage and review, expanding review capacity beyond a small core group.
  • Implement a quick check for PR duplication by comparing the content of recent PRs with the current PR using text-similarity, and comment with a warning if a potential duplication is detected.
  • Be mindful of GitHub API budget usage when implementing the duplicate-detection workflow.

Example

No code snippet is provided as it is not clearly supported by the issue, but an example of a contributors ladder is given:

ladder :
Member -> Triager ->  Reviewer -> Approver --> Maintainer / Committer --> Tech Lead  / Steering Committee

Notes

The proposed solution may have limitations, such as the accuracy of text-similarity checks and the potential for false positives. Additionally, the implementation of the contributors ladder and duplicate-detection workflow may require significant changes to the existing workflow and API usage.

Recommendation

Apply a workaround by implementing a duplicate-detection CI workflow using text similarity to reduce redundant PRs, as this is a relatively low-risk and low-effort solution that can help alleviate reviewer burden.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING