vllm - ✅(Solved) Fix [Bug/Regression]: CPU spikes to 100% on Multi-node (PP=2, TP=8) | Regression starting from v0.12.0 [2 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36943Fetched 2026-04-08 00:43:27
View on GitHub
Comments
5
Participants
4
Timeline
15
Reactions
0
Timeline (top)
commented ×5mentioned ×4subscribed ×4cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #36977: chore(pre-commit): make bash hooks runnable on Windows

Description (problem / solution / changelog)

Fixes #36943

<!-- markdownlint-disable -->

Purpose

Test Plan

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tools/pre_commit/png-lint.sh (modified, +1/-1)
  • tools/pre_commit/shellcheck.sh (modified, +2/-2)
  • tools/pre_commit/update-dockerfile-graph.sh (modified, +1/-1)

PR #37743: [CI] replace shellcheck script with shellcheck-py hook

Description (problem / solution / changelog)

Purpose

Related to #36977 (no activity since 2026-03-13).

Replace the custom tools/pre_commit/shellcheck.sh with the shellcheck-py pre-commit hook. The old script only worked on Linux x86_64; shellcheck-py ships shellcheck as a Python wheel and is cross-platform.

Also fixes real issues surfaced by the new hook: SC2089/SC2090 in run-multi-node-test.sh (string-quoted GPU args → bash array), SC2048 in spec_decode_acceptance_test.sh, and various SC2086 quote fixes. Remaining shellcheck warnings will be addressed in follow-up PRs.

error message

<img width="1283" height="988" alt="image" src="https://github.com/user-attachments/assets/518e3f87-0337-47ed-8081-dea2dac454a1" />

Test Plan

pre-commit run -a

Test Result

<img width="1531" height="629" alt="image" src="https://github.com/user-attachments/assets/c9073057-1816-432f-8fe6-93da56b31804" />
<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • .buildkite/scripts/run-multi-node-test.sh (modified, +13/-19)
  • .buildkite/scripts/tool_call/run-bfcl-eval.sh (modified, +1/-0)
  • .pre-commit-config.yaml (modified, +6/-5)
  • tests/v1/kv_connector/nixl_integration/run_xpu_disagg_accuracy_test.sh (modified, +14/-10)
  • tests/v1/kv_connector/nixl_integration/spec_decode_acceptance_test.sh (modified, +23/-21)
  • tools/pre_commit/shellcheck.sh (removed, +0/-24)

Code Example

The output of `python collect_env.py`
RAW_BUFFERClick to expand / collapse

Proposal to improve performance

We are observing a severe regression in CPU management when running vLLM in a multi-node setup on OpenShift. While the deployment is perfectly stable on v0.11.0, any version from v0.12.0 up to v0.17.0 triggers complete CPU saturation.

Even under light inference load, the CPU utilization on both nodes spikes to 100%, leading to a rapid decline in throughput and eventual node unresponsiveness. This behavior is consistent across various models using a TP=8, PP=2 topology on 2xH100 nodes via LeaderWorkerSet (LWS).

The fact that the exact same configuration and infrastructure work seamlessly on v0.11.0 strongly suggests a regression introduced in v0.12.0, possibly related to how the distributed backend or the scheduler interacts with OpenShift's resource constraints.

Questions: Were there specific changes in the Ray/Distributed backend or scheduling logic in v0.12.0 that could cause this busy-wait loop or thread contention? Are there recommended environment variables or flags to tune Ray/vLLM resource management in hardened environments like OpenShift?

Report of performance regression

No response

Misc discussion on performance

Infrastructure: OpenShift Container Platform (LWS). Topology: 2 Nodes, 8xH100 per node. Config: tensor_parallel_size=8, pipeline_parallel_size=2. Versions affected: v0.12.0 through v0.17.0.

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the CPU saturation issue in the multi-node setup on OpenShift, we will focus on tuning Ray/vLLM resource management.

Step-by-Step Solution:

  1. Adjust Resource Allocation: Modify the resource_requests and resource_limits in your Ray cluster configuration to better match your workload.
  2. Tune Scheduling Parameters: Experiment with different scheduling strategies and parameters, such as scheduler_class and scheduling_strategy, to optimize task distribution.
  3. Environment Variables: Utilize environment variables like RAY_SCHEDULER_SPREAD_OUT to control task placement and RAY_OBJECT_STORE_MEMORY to manage memory allocation.

Example Code Snippets:

import ray

# Initialize Ray with custom resource configuration
ray.init(
    num_cpus=16,  # Adjust based on your node's CPU count
    num_gpus=8,   # Adjust based on your node's GPU count
    resources={'CustomResource': 1}  # Define custom resources if needed
)

# Configure the scheduler
from ray.tune import Scheduler
from ray.tune.schedulers import StandardScheduler

# Define a custom scheduler class if needed
class CustomScheduler(StandardScheduler):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # Custom scheduling logic can be implemented here

# Set environment variables
import os
os.environ["RAY_SCHEDULER_SPREAD_OUT"] = "1"  # Enable spreading out tasks
os.environ["RAY_OBJECT_STORE_MEMORY"] = "10GB"  # Adjust object store memory

Verification

To verify the fix, monitor CPU utilization and throughput after applying the configuration changes. You can use tools like top, htop, or OpenShift's built-in monitoring dashboards to track performance metrics.

Extra Tips

  • Regularly review and adjust resource allocation and scheduling parameters based on your workload's requirements.
  • Consider implementing custom scheduling logic to optimize task distribution and resource utilization.
  • Keep your Ray and vLLM versions up-to-date to ensure you have the latest performance optimizations and bug fixes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING