vllm - ✅(Solved) Fix [RFC][XPU]: Enable Intel XPU CI for vLLM [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37305Fetched 2026-04-08 00:53:35
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
1
Participants
Timeline (top)
labeled ×2subscribed ×2cross-referenced ×1renamed ×1

PR fix notes

PR #37447: [CI/Build] enable Intel XPU test flow with prebuilt image

Description (problem / solution / changelog)

This PR create to enable a standalone intel CI pipeline

Purpose

add xpu image build and ci pipeline

##design <img width="969" height="717" alt="image" src="https://github.com/user-attachments/assets/1fae149e-5280-45c0-bbf6-fbef769c8570" />

Test Plan

run 5 times to ensure stable

Test Result

depend on

ci-infra PR: https://github.com/vllm-project/ci-infra/pull/306/


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • .buildkite/ci_config_intel.yaml (added, +23/-0)
  • .buildkite/image_build/image_build_xpu.sh (added, +34/-0)
  • .buildkite/intel_jobs/test-intel.yaml (added, +63/-0)
  • .buildkite/scripts/hardware_ci/run-intel-test.sh (added, +276/-0)

PR #306: Add intel ci in case generator

Description (problem / solution / changelog)

(No description)

Changed files

  • buildkite/bootstrap-intel.sh (added, +310/-0)
  • buildkite/pipeline_generator/buildkite_step.py (modified, +13/-0)
  • buildkite/pipeline_generator/step.py (modified, +2/-0)
RAW_BUFFERClick to expand / collapse

Motivation.

1. Summary

This RFC proposes enabling a dedicated Intel XPU CI pipeline for vLLM.
The goal is to ensure that updates to vLLM maintain correctness and performance on Intel XPU devices, while improving test efficiency, parallelism, and scalability of CI.


2. Motivation / Background

Currently, the vLLM CI on Intel XPU is limited:

  • A single simple script triggers both build and sanity tests.
  • Build and tests execute on the same machine, leading to low device utilization.
  • Tests are not executed in parallel, reducing efficiency.
  • Test case management and expansion are inefficient.
  • The current workflow does not follow ci-infra’s design standards.

With increasing contributions targeting Intel XPU in vLLM, a dedicated Intel CI pipeline is necessary to:

  • Guarantee correctness and performance on Intel XPU.
  • Improve test parallelism and device utilization.
  • Enable scalable, maintainable test case management.

3. Problem Statement

  1. Inefficient device usage: build and tests share the same Intel XPU machine sequentially.
  2. Non-parallel test execution: limits throughput and increases CI runtime.
  3. Limited test case management: adding or enabling new cases is not efficient.
  4. Non-standard CI workflow: current CI does not follow the ci-infra design pattern.

Proposed Change.

4. Proposal / Design

<img width="687" height="568" alt="Image" src="https://github.com/user-attachments/assets/7c4969e3-b2aa-49ec-b719-00e6115b092c" />

We propose a staged approach to enable Intel XPU CI:

Stage 1: Stable Intel CI Implementation (~40% UT enable)

Stage 2: Gradual Test Case Expansion (~60% UT enable)

  • Incrementally enable additional unit tests (UT) for Intel XPU.
  • Adjust machine allocation to ensure that a full CI run completes within ~1 hour.
  • Monitor stability and runtime metrics to balance load.

Stage 3: Expanded Test Coverage (~85% UT enable)

  • Continue enabling more test cases, focusing on high-priority or high-risk features.
  • Maintain test parallelism and optimize machine allocation.
  • Goal: 85% of test cases enabled on Intel XPU.

Stage 4: Full Test Coverage (~95% UT enable) & Mirror GPU CI

  • Enable ~95% of total test cases on Intel XPU.
  • Begin integrating mirror GPU tests to support gating CI workflows.
  • Achieve a fully functional, maintainable, and scalable Intel XPU CI pipeline.

5. Detailed Design Considerations

  1. CI Infrastructure

    • Use Buildkite agents or GitHub Actions runners with Intel XPU support.
    • Separate build and test stages with dedicated agents.
    • Ensure proper device isolation and environment setup (Docker / oneAPI / IPEX).
  2. Test Execution

    • Use existing pytest framework with proper -k filtering for XPU-relevant tests.
    • Enable parallel execution where possible (pytest-xdist or similar).
    • Track and log XPU-specific failures separately for easier triage.
  3. Test Case Management

    • Maintain a case enablement matrix to track which tests are active on XPU.
    • Stage-wise increase in enabled tests to manage CI runtime.
  4. Metrics & Monitoring

    • Track CI runtime, machine utilization, and device load.
    • Adjust machine allocation dynamically based on runtime statistics.
  5. Integration with ci-infra

    • Follow existing ci-infra patterns for pipeline structure, logging, and artifact management.
    • Ensure the new Intel XPU CI can be maintained alongside existing pipelines (CPU/NVIDIA).

6. Impact

  • Improved device utilization: separate build/test stages and parallel execution.
  • Scalable test case management: incremental enabling of UTs.
  • Faster feedback for Intel XPU contributors: reduced CI runtime and higher reliability.
  • CI maintenance cost: additional runners and monitoring required, but staged approach mitigates risk.

7. Open Questions / Discussion Points

  1. Should Stage 2 initially enable only critical tests or a broader selection?
  2. How many machines are optimal for Stage 1 and Stage 2 to ensure <1 hour CI runtime?
  3. Should mirror GPU CI be run in parallel with Intel XPU CI or sequentially?
  4. Any additional metrics or monitoring requirements for XPU-specific tests?

extent analysis

Fix Plan

To address the issues with the current Intel XPU CI pipeline, we will implement a staged approach:

  1. Stage 1: Stable Intel CI Implementation

    • Separate build and test stages using Buildkite agents or GitHub Actions runners.
    • Implement parallel test execution using pytest-xdist.
    • Create a case enablement matrix to track enabled tests.
  2. Stage 2: Gradual Test Case Expansion

    • Incrementally enable additional unit tests for Intel XPU.
    • Adjust machine allocation to ensure CI runtime is within 1 hour.
    • Monitor stability and runtime metrics.
  3. Stage 3: Expanded Test Coverage

    • Continue enabling test cases, focusing on high-priority features.
    • Maintain test parallelism and optimize machine allocation.
  4. Stage 4: Full Test Coverage & Mirror GPU CI

    • Enable 90% of total test cases on Intel XPU.
    • Integrate mirror GPU tests to support gating CI workflows.

Example Code

To implement parallel test execution using pytest-xdist:

# pytest.ini
[pytest]
addopts = -n 4  # Run 4 tests in parallel

To create a case enablement matrix:

# test_enablement_matrix.py
import pandas as pd

# Define test cases and their status
test_cases = [
    {"test_name": "test1", "enabled": True},
    {"test_name": "test2", "enabled": False},
]

# Create a DataFrame
df = pd.DataFrame(test_cases)

# Save to CSV
df.to_csv("test_enablement_matrix.csv", index=False)

To adjust machine allocation based on CI runtime:

# ci_runtime_monitor.py
import time

# Define CI runtime threshold (1 hour)
threshold = 3600

# Monitor CI runtime
start_time = time.time()
# Run CI pipeline
end_time = time.time()

# Calculate CI runtime
ci_runtime = end_time - start_time

# Adjust machine allocation if CI runtime exceeds threshold
if ci_runtime > threshold:
    # Reduce machine allocation
    print("Reducing machine allocation")

Verification

To verify the fix, monitor CI runtime, machine utilization, and device load. Check that:

  • CI runtime is within 1 hour
  • Machine utilization is optimized
  • Device load is balanced
  • Test cases are enabled and running in parallel

Extra Tips

  • Use existing ci-infra patterns for pipeline structure, logging, and artifact management.
  • Ensure the new Intel XPU CI can be maintained alongside existing pipelines (CPU/NVIDIA).
  • Continuously monitor and adjust machine allocation based on runtime statistics.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING