Fix Action

PR fix notes

PR #39695: Introduce De-dup CI Workflow for PR/Issue

panpan0000 · 2026-04-13T10:45:10Z

[vllm] PR 39695: Introduce De-dup CI Workflow for PR/Issue - Repository: vllm-project/vllm - Author: panpan0000 - State: open | merged: False - Link: https://g… # PR #39695: Introduce De-dup CI Workflow for PR/Issue - Repository: vllm-project/vllm - Author: panpan0000 - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/39695 ## Description (problem / solution / changelog) ## Purpose Example to explain https://github.com/vllm-project/vllm/issues/39694 - Scoring: 0.75 * text_similarity + 0.25 * file_overlap . - Threshold used for report: 0.75 . ## Test Plan Using 1000 recent PR to test the similarity check : High-similarity pairs ( >=0.75 ): 26 ## Test Result # PR Similarity Report - Repo: `vllm-project/vllm` - PR count: `1000` - Candidate pairs: `17375` - High-similarity pairs (>= 0.75): `26` | Score | Text | Files | PR A | PR B | |---|---|---|---|---| | 100% | 100% | 100% | [#39553](https://github.com/vllm-project/vllm/pull/39553) Okakarpa shadow clone | [#39577](https://github.com/vllm-project/vllm/pull/39577) Okakarpa shadow clone | | 99% | 99% | 100% | [#37929](https://github.com/vllm-project/vllm/pull/37929) [Core] Use standalone autograd_cache_key for compilation dedup optimization | [#39517](https://github.com/vllm-project/vllm/pull/39517) [Core] Use standalone autograd_cache_key for compilation dedup optimization | | 96% | 95% | 100% | [#37947](https://github.com/vllm-project/vllm/pull/37947) [DRAFT][XPU] Upgrade torch 2.11 for xpu | [#39257](https://github.com/vllm-project/vllm/pull/39257) [XPU] update triton version for torch 2.11 upgrade | | 96% | 95% | 100% | [#37947](https://github.com/vllm-project/vllm/pull/37947) [DRAFT][XPU] Upgrade torch 2.11 for xpu | [#39313](https://github.com/vllm-project/vllm/pull/39313) [XPU] upgrade to triton-xpu 3.7.0 | | 95% | 97% | 88% | [#38249](https://github.com/vllm-project/vllm/pull/38249) [Misc] Organize NixlConnector into own directory | [#39354](https://github.com/vllm-project/vllm/pull/39354) [KVConnector][NIXL] Organize NIXL connector into its own directory | | 95% | 93% | 100% | [#39410](https://github.com/vllm-project/vllm/pull/39410) [XPU] Disable fusion passes on XPU Platform | [#39671](https://github.com/vllm-project/vllm/pull/39671) use spawn multiproc method on xpu | | 94% | 92% | 100% | [#38856](https://github.com/vllm-project/vllm/pull/38856) [LMCache] vLLM Block Allocation Event | [#39719](https://github.com/vllm-project/vllm/pull/39719) fix(lmcache): correct store for cached requests while enable prefix cache | | 94% | 91% | 100% | [#39606](https://github.com/vllm-project/vllm/pull/39606) Pass extra_config to the constructor of LMCacheMPXXXAdapter | [#39719](https://github.com/vllm-project/vllm/pull/39719) fix(lmcache): correct store for cached requests while enable prefix cache | | 94% | 91% | 100% | [#39257](https://github.com/vllm-project/vllm/pull/39257) [XPU] update triton version for torch 2.11 upgrade | [#39313](https://github.com/vllm-project/vllm/pull/39313) [XPU] upgrade to triton-xpu 3.7.0 | | 91% | 100% | 67% | [#39432](https://github.com/vllm-project/vllm/pull/39432) Gfx1250 wip | [#39437](https://github.com/vllm-project/vllm/pull/39437) Gfx1250 wip rebase test | | 90% | 92% | 85% | [#36823](https://github.com/vllm-project/vllm/pull/36823) [vLLM IR] 3/N fused_add_rms_norm and maybe_inplace | [#38775](https://github.com/vllm-project/vllm/pull/38775) [vLLM IR] 4/N Compile native implementation | | 90% | 86% | 100% | [#39402](https://github.com/vllm-project/vllm/pull/39402) [kv_offload+HMA[10/N]: Support load with multiple KV groups | [#39403](https://github.com/vllm-project/vllm/pull/39403) [kv_offload+HMA][11/N]: Support store with multiple KV groups | | 86% | 98% | 50% | [#23995](https://github.com/vllm-project/vllm/pull/23995) Feature/deepseek v31 lora support | [#39661](https://github.com/vllm-project/vllm/pull/39661) [DOC] Update Gemma 4 | | 82% | 76% | 100% | [#39110](https://github.com/vllm-project/vllm/pull/39110) [Core] Disable HMA for eagle/MTP with sliding window models | [#39376](https://github.com/vllm-project/vllm/pull/39376) [Core] Disable HMA for eagle/MTP with sliding window models | | 82% | 76% | 100% | [#39401](https://github.com/vllm-project/vllm/pull/39401) [kv_offload+HMA][9/N]: Support lookup with multiple KV groups | [#39402](https://github.com/vllm-project/vllm/pull/39402) [kv_offload+HMA[10/N]: Support load with multiple KV groups | | 82% | 76% | 100% | [#39401](https://github.com/vllm-project/vllm/pull/39401) [kv_offload+HMA][9/N]: Support lookup with multiple KV groups | [#39403](https://github.com/vllm-project/vllm/pull/39403) [kv_offload+HMA][11/N]: Support store with multiple KV groups | | 80% | 96% | 33% | [#26583](https://github.com/vllm-project/vllm/pull/26583) add log for request trace | [#39646](https://github.com/vllm-project/vllm/pull/39646) V0.12.0 support n sampling delay split to eliminate redundant prefill computation and memory | | 79% | 97% | 22% | [#35721](https

Repository: vllm-project/vllm
Author: panpan0000
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39695

Description (problem / solution / changelog)

Purpose

Example to explain https://github.com/vllm-project/vllm/issues/39694

Scoring: 0.75 * text_similarity + 0.25 * file_overlap .
Threshold used for report: 0.75 .

Test Plan

Using 1000 recent PR to test the similarity check :

High-similarity pairs ( >=0.75 ): 26

Test Result

PR Similarity Report

Repo: vllm-project/vllm
PR count: 1000
Candidate pairs: 17375
High-similarity pairs (>= 0.75): 26

Score	Text	Files	PR A	PR B
100%	100%	100%	#39553 Okakarpa shadow clone	#39577 Okakarpa shadow clone
99%	99%	100%	#37929 [Core] Use standalone autograd_cache_key for compilation dedup optimization	#39517 [Core] Use standalone autograd_cache_key for compilation dedup optimization
96%	95%	100%	#37947 [DRAFT][XPU] Upgrade torch 2.11 for xpu	#39257 [XPU] update triton version for torch 2.11 upgrade
96%	95%	100%	#37947 [DRAFT][XPU] Upgrade torch 2.11 for xpu	#39313 [XPU] upgrade to triton-xpu 3.7.0
95%	97%	88%	#38249 [Misc] Organize NixlConnector into own directory	#39354 [KVConnector][NIXL] Organize NIXL connector into its own directory
95%	93%	100%	#39410 [XPU] Disable fusion passes on XPU Platform	#39671 use spawn multiproc method on xpu
94%	92%	100%	#38856 [LMCache] vLLM Block Allocation Event	#39719 fix(lmcache): correct store for cached requests while enable prefix cache
94%	91%	100%	#39606 Pass extra_config to the constructor of LMCacheMPXXXAdapter	#39719 fix(lmcache): correct store for cached requests while enable prefix cache
94%	91%	100%	#39257 [XPU] update triton version for torch 2.11 upgrade	#39313 [XPU] upgrade to triton-xpu 3.7.0
91%	100%	67%	#39432 Gfx1250 wip	#39437 Gfx1250 wip rebase test
90%	92%	85%	#36823 [vLLM IR] 3/N fused_add_rms_norm and maybe_inplace	#38775 [vLLM IR] 4/N Compile native implementation
90%	86%	100%	#39402 [kv_offload+HMA[10/N]: Support load with multiple KV groups	#39403 [kv_offload+HMA][11/N]: Support store with multiple KV groups
86%	98%	50%	#23995 Feature/deepseek v31 lora support	#39661 [DOC] Update Gemma 4
82%	76%	100%	#39110 [Core] Disable HMA for eagle/MTP with sliding window models	#39376 [Core] Disable HMA for eagle/MTP with sliding window models
82%	76%	100%	#39401 [kv_offload+HMA][9/N]: Support lookup with multiple KV groups	#39402 [kv_offload+HMA[10/N]: Support load with multiple KV groups
82%	76%	100%	#39401 [kv_offload+HMA][9/N]: Support lookup with multiple KV groups	#39403 [kv_offload+HMA][11/N]: Support store with multiple KV groups
80%	96%	33%	#26583 add log for request trace	#39646 V0.12.0 support n sampling delay split to eliminate redundant prefill computation and memory
79%	97%	22%	#35721 [LoRA] Support dual CUDA streams-Linear Layer	#37297 [LoRA] Support FP8 LoRA E2E inference-dense model
79%	94%	32%	#39153 [Frontend][4/n] Improve pooling entrypoints	pooling.
79%	74%	91%	#38775 [vLLM IR] 4/N Compile native implementation	#39453 Port activations to IR op 1/3
79%	88%	50%	#39312 [Mergify] Update model vendor auto-label rules	#39429 [CI/Build] Update auto-rebase rule
78%	100%	13%	#39723 [SimpleCPUOffloadConnector]: Add support for `reset_cache()`	#39726 [SimpleCPUOffloadConnector]: Add support for reset_cache()
77%	98%	14%	#38780 [vLLM IR][RMSNorm] Port GemmaRMSNorm to vLLM IR Ops	#38798 [vLLM IR][RMSNorm] Port RMSNormGated to vLLM IR Ops
77%	69%	100%	#39744 [v1] Expose num_prompt_tokens in CommonAttentionMetadata	#39745 [v1] Expose num_prompt_tokens in CommonAttentionMetadata
77%	81%	62%	#23133 Split compressed_tensors_moe.py into separate wna16, int8, fp8, nvfp4	#29427 [Refactor] Split up compressed_tensors_moe.py into separate files per method
76%	82%	59%	#39267 [vllm IR] 1/N Port FP8 Quantization to vLLM IR Ops	#39481 [vllm IR] Port FP8 Quantization to vLLM IR Ops

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

.github/workflows/detect-duplicate-issues.yml (added, +29/-0)
.github/workflows/detect-duplicate-prs.yml (added, +30/-0)
.github/workflows/scripts/detect_duplicate_issues.py (added, +149/-0)
.github/workflows/scripts/detect_duplicate_prs.py (added, +249/-0)

Motivation.

With more and more AI-assisted or even vibe coding , PR creation speed now outpaces human review capacity. This has led to a huge growing accumulation of open PRs, reviewer burden has become overwhelmed and PR comes more....

To break this vicious cycle , Maybe( sorry to jump in, I believe committee already in discussion ) we can push on a few fronts in parallel:

add duplicate-detection CI workflow (e.g., text and changed-file similarity) to reduce redundant PRs; [Using LLM may be cost too much $$ .... so I choose text similarity for now ... ]
same de-dup applied to issues ? ( but I' not sure plain similarity can work....)
like some other open source community, introduce contributors ladder , to motivate them to participate in triage and review, expanding review capacity beyond a small core group. For example: we already have AI-assisted reviewer, but sometimes need a committer "+1" to trigger it. those new reviewer can do that instead.

ladder :

Member -> Triager ->  Reviewer -> Approver --> Maintainer / Committer --> Tech Lead  / Steering Committee

Proposed Change.

add some quick check for PR duplication check

getting content of recent 100 or so PR list
compare the historical PRs with current PR by text-similarity
comment(warn) the potential duplication of similarity score is high

https://github.com/vllm-project/vllm/pull/39695 as a quick example (Throw a brick to attract jade... )

For example

If similar PR detected , a "de-dup" warning will pop up and adding the linkage.

which just use 1.5 mins to complete , just will use a lot of Github API budget...

apology

I'm sorry, below contributors, when I was doing tests in my own repo, script wrongly cherry-pick your PRs and adding references. Excuse me to bring any confusion! @tlrmchlsmth @markmc @gcanlin @Harry-Chen @fxmarty-amd @chaunceyjiang @KimuGenie @yma11

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Implement a duplicate-detection CI workflow using text similarity to reduce redundant PRs and alleviate reviewer burden.

Guidance

Consider using a CI workflow that checks for duplicate PRs based on text similarity to reduce the number of redundant PRs.
Introduce a contributors ladder to motivate contributors to participate in triage and review, expanding review capacity beyond a small core group.
Implement a quick check for PR duplication by comparing the content of recent PRs with the current PR using text-similarity, and comment with a warning if a potential duplication is detected.
Be mindful of GitHub API budget usage when implementing the duplicate-detection workflow.

Example

No code snippet is provided as it is not clearly supported by the issue, but an example of a contributors ladder is given:

ladder :
Member -> Triager ->  Reviewer -> Approver --> Maintainer / Committer --> Tech Lead  / Steering Committee

Notes

The proposed solution may have limitations, such as the accuracy of text-similarity checks and the potential for false positives. Additionally, the implementation of the contributors ladder and duplicate-detection workflow may require significant changes to the existing workflow and API usage.

Recommendation

Apply a workaround by implementing a duplicate-detection CI workflow using text similarity to reduce redundant PRs, as this is a relatively low-risk and low-effort solution that can help alleviate reviewer burden.

vllm - ✅(Solved) Fix [RFC]: maybe add PR deduplication CI workflow ? [45 pull requests, 6 comments, 5 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #39695: Introduce De-dup CI Workflow for PR/Issue

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

PR Similarity Report

Changed files

Code Example

Motivation.

Proposed Change.

apology

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING