vllm - 💡(How to fix) Fix [Tracking][NUMA] Replace hard-coded Granite Rapids PCT detection with a generic, root-free path [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

PR #43270 adds opt-in NUMA pinning to Intel Priority Core Turbo (PCT) "priority" cores on Xeon 6776P / 6774P / 6962P, gated by a hard-coded SKU table in vllm/utils/numa_utils.py::_PCT_CAPABLE_SKUS. The workaround delivered a measurable end-to-end win on DGX B300 (Qwen3.5-397B-A17B-NVFP4, TP=8, 32K/2 prompts, MC=128: total token throughput 46.1k -> 75.8k tok/s, +64.4 %), but it is intentionally narrow:

  • New PCT-capable SKUs require a vLLM patch that adds them to the table.
  • The cpu_id % stride in (0, 1) priority-core filter is empirical, not derived from the kernel.
  • The CPPC highest_perf value is hard-coded per SKU (4.6 GHz -> 46, 4.4 GHz -> 44).

This issue tracks removing that workaround once we have a way to discover PCT priority cores without root and without per-SKU enumeration.

Root Cause

PR #43270 adds opt-in NUMA pinning to Intel Priority Core Turbo (PCT) "priority" cores on Xeon 6776P / 6774P / 6962P, gated by a hard-coded SKU table in vllm/utils/numa_utils.py::_PCT_CAPABLE_SKUS. The workaround delivered a measurable end-to-end win on DGX B300 (Qwen3.5-397B-A17B-NVFP4, TP=8, 32K/2 prompts, MC=128: total token throughput 46.1k -> 75.8k tok/s, +64.4 %), but it is intentionally narrow:

  • New PCT-capable SKUs require a vLLM patch that adds them to the table.
  • The cpu_id % stride in (0, 1) priority-core filter is empirical, not derived from the kernel.
  • The CPPC highest_perf value is hard-coded per SKU (4.6 GHz -> 46, 4.4 GHz -> 44).

This issue tracks removing that workaround once we have a way to discover PCT priority cores without root and without per-SKU enumeration.

Fix Action

Fix / Workaround

PR #43270 adds opt-in NUMA pinning to Intel Priority Core Turbo (PCT) "priority" cores on Xeon 6776P / 6774P / 6962P, gated by a hard-coded SKU table in vllm/utils/numa_utils.py::_PCT_CAPABLE_SKUS. The workaround delivered a measurable end-to-end win on DGX B300 (Qwen3.5-397B-A17B-NVFP4, TP=8, 32K/2 prompts, MC=128: total token throughput 46.1k -> 75.8k tok/s, +64.4 %), but it is intentionally narrow:

  • New PCT-capable SKUs require a vLLM patch that adds them to the table.
  • The cpu_id % stride in (0, 1) priority-core filter is empirical, not derived from the kernel.
  • The CPPC highest_perf value is hard-coded per SKU (4.6 GHz -> 46, 4.4 GHz -> 44).

This issue tracks removing that workaround once we have a way to discover PCT priority cores without root and without per-SKU enumeration.

Code Example

$ ls -l /dev/isst_interface
crw------- 1 root root 10, 118 May 22 11:00 /dev/isst_interface
RAW_BUFFERClick to expand / collapse

Summary

PR #43270 adds opt-in NUMA pinning to Intel Priority Core Turbo (PCT) "priority" cores on Xeon 6776P / 6774P / 6962P, gated by a hard-coded SKU table in vllm/utils/numa_utils.py::_PCT_CAPABLE_SKUS. The workaround delivered a measurable end-to-end win on DGX B300 (Qwen3.5-397B-A17B-NVFP4, TP=8, 32K/2 prompts, MC=128: total token throughput 46.1k -> 75.8k tok/s, +64.4 %), but it is intentionally narrow:

  • New PCT-capable SKUs require a vLLM patch that adds them to the table.
  • The cpu_id % stride in (0, 1) priority-core filter is empirical, not derived from the kernel.
  • The CPPC highest_perf value is hard-coded per SKU (4.6 GHz -> 46, 4.4 GHz -> 44).

This issue tracks removing that workaround once we have a way to discover PCT priority cores without root and without per-SKU enumeration.

Why the workaround exists today

Quoting #43270 (comment by @vadiklyutiy):

1. Kernel doesn't help

In a perfect world the Linux scheduler would prefer PCT priority cores when there are fewer hot threads than priority cores, and vLLM wouldn't need to care. Two recent kernels were tested:

  • 6.8.0-90-generic (Ubuntu, kernel from Mar 2024)
  • 6.14.0-37-generic (Ubuntu, kernel from Mar 2025)

Neither preferentially schedules work on PCT priority cores out of the box.

2. PCT discovery is root-only

The only kernel interface that reports PCT / CLOS membership is /dev/isst_interface, used by intel-speed-select:

$ ls -l /dev/isst_interface
crw------- 1 root root 10, 118 May 22 11:00 /dev/isst_interface

There is no unprivileged sysfs path that exposes per-CPU PCT membership. Intel's own guidance is to use intel-speed-select, which has the same root requirement. Production environments (shared clusters, managed cloud, prebuilt containers) typically can't grant root or rely on intel-speed-select being installed.

3. The two stop-gap alternatives are insufficient

  • "Document the manual --numa-bind-cpus recipe" — the users who need PCT pinning the most are also the ones least likely to read NUMA/PCT docs and act on them.
  • "Add a --numa-bind=pct flag that shells out to intel-speed-select" — still root-only, still requires the tool in the image, and leaves the default path slow.

See PR #33222 (intel-ai-tce) for the second variant explored as standalone scripts plus docker-compose. It is complementary to #43270 (manual workflow vs. zero-config auto-detection) and has the same root-only limitation.

Definition of done

The workaround can be removed when PCT priority cores can be discovered:

  1. without root, and
  2. without enumerating individual SKUs in vLLM,

so that _PCT_CAPABLE_SKUS and _pct_sku_config() can be deleted and replaced by a generic detector. The detector should ideally also support hosts where PCT is dynamically reconfigured, since the priority-core set is BIOS / runtime configurable per Intel-Speed-Select-Technology.

Candidate paths forward

In rough order of preference (each independent):

A. Kernel-side: PCT-aware scheduler

If the upstream Linux scheduler learns to bias work toward PCT priority cores when CPU contention is low, the entire vLLM-side mechanism becomes unnecessary. Track Intel / kernel mailing-list patches; once a stable kernel ships PCT-aware scheduling, drop --numa-bind PCT logic on those kernels.

B. Kernel-side: unprivileged sysfs for PCT membership

A simpler change than scheduler patches: expose per-CPU PCT membership via /sys/devices/system/cpu/cpuN/... (similar to cpufreq/scaling_* or topology/core_id). vLLM would read it in user mode, no root, no intel-speed-select. Worth raising upstream as a feature request even if PCT-aware scheduling is far off.

C. Runtime probe via cpufreq

When PCT is active, priority cores reach a noticeably higher scaling_max_freq / actually-observed turbo frequency. A short calibration loop at startup could classify cores. Fragile (frequency is workload-dependent) but root-free; might be useful as a fallback when other paths are unavailable.

Acceptance criteria

  • PCT priority cores can be detected on a fresh Granite Rapids host without root.
  • No hard-coded SKU list is needed (or, if a list is unavoidable, it lives in a kernel/OS-provided file rather than vLLM source).
  • vllm/utils/numa_utils.py::_PCT_CAPABLE_SKUS, _PctSku, and _pct_sku_config can be deleted, and the existing tests in tests/utils_/test_numa_utils.py::test_pct_binding_* can be replaced with the new detector's tests.
  • DGX B300 + Xeon 6776P E2E throughput remains within noise of the +64.4 % observed in PR #43270.

References

  • PR #43270 — the workaround being tracked here.
  • PR #33222 — alternative approach via intel-speed-select and standalone scripts.
  • PR #38635 — the original --numa-bind plumbing the workaround sits on top of.
  • Intel ARK SKU pages (used to derive the table values in PR #43270):
  • intel-speed-select(8) — the root-only canonical interface today.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Tracking][NUMA] Replace hard-coded Granite Rapids PCT detection with a generic, root-free path [1 pull requests]