vllm - 💡(How to fix) Fix [CI Failure]: PyTorch Fullgraph Smoke Test — counter assertions and AOT cache collision in tests/compile/fullgraph/

Fix Action

Fix

PR #41953 fixes both:

Replace cloudpickle-by-value of f with by-name resolution in the child (send f.__module__ + f.__qualname__, look up via importlib.import_module + getattr; use a VLLM_TEST_SPAWN_CHILD env var so the wrapper short-circuits when the child re-resolves itself). Args/kwargs are still cloudpickled. This preserves singleton identity for every module-level global the test touches.
Add monkeypatch.setenv("VLLM_DISABLE_COMPILE_CACHE", "1") to test_simple_piecewise_compile, matching its sibling tests.

The underlying vLLM AOT-cache key is too coarse for any model whose forward branches on an __init__ arg (production-side risk), but tightening it is non-trivial and out of scope for this fix.

Code Example

tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[eager-False]
tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[inductor-False]
tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[inductor-True]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[True-False]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[True-True]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[False-False]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[False-True]
tests/compile/fullgraph/test_simple.py::test_simple_piecewise_compile[False-inductor]

---

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/counter.py", line 51, in expect
    assert getattr(self, k) - getattr(old, k) == v, (
AssertionError: num_graphs_seen not as expected, before it is 0, after it is 0, expected diff is 1

---

[10:24:38.514559] coredump: Detected an exception of type CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS (14)
[10:24:38.514906] coredump:   #0	0x7fd8132723e0	triton_poi_fused_add_0

Name of failing test

tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[eager-False]
tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[inductor-False]
tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[inductor-True]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[True-False]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[True-True]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[False-False]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[False-True]
tests/compile/fullgraph/test_simple.py::test_simple_piecewise_compile[False-inductor]

Basic information

Can reproduce locally
Flaky test
Caused by external libraries

🧪 Describe the failing test

The "PyTorch Fullgraph Smoke Test" Buildkite step fails with two distinct error patterns (CI build #64888).

Pattern 1 — counter assertion (7 of 8):

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/counter.py", line 51, in expect
    assert getattr(self, k) - getattr(old, k) == v, (
AssertionError: num_graphs_seen not as expected, before it is 0, after it is 0, expected diff is 1

The test sees a compilation_counter whose num_graphs_seen stays at 0 even after torch.compile runs and VllmBackend.__call__ increments it.

Root cause: tests/utils.py:spawn_new_process_for_each_test cloudpickles the inner test function f and runs it in a child interpreter via python -c. The decorator hides f behind the wrapper (module.test_foo resolves to the wrapper, not f), so cloudpickle cannot pickle by reference and falls back to by-value pickling — which serializes f.__globals__. Module-level singletons in that dict (notably vllm.compilation.counter.compilation_counter) get pickled as NEWOBJ + state and reconstructed as fresh clones in the child. Production code (VllmBackend.__call__) increments the real singleton; the test's reference is the stale clone, so the diff is always 0.

Other module-level singletons (vllm.lora.resolver.LoRAResolverRegistry, vllm.tokenizers.registry.TokenizerRegistry) are at risk for the same reason.

Pattern 2 — CUDA illegal access (1 of 8):

[10:24:38.514559] coredump: Detected an exception of type CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS (14)
[10:24:38.514906] coredump:   #0	0x7fd8132723e0	triton_poi_fused_add_0

test_simple_piecewise_compile parametrizes over intermediate_unbacked (True/False), which gates a control-flow branch in SillyModel.forward. The AOT-compile cache key (vllm/compilation/decorators.py:_model_hash_key) only hashes vllm.__version__ + fn.__qualname__ + fn.__code__.co_firstlineno — per-instance attributes aren't included. Both parametrize variants share the same cache slot but Dynamo traces them differently. Whichever variant runs first persists its compiled Triton kernel; the other loads the stale artifact and crashes with an illegal memory access.

Demonstrated locally:

Order	Result
`[True-inductor]` (fresh cache)	PASS
`[False-inductor]` (loads `[True-inductor]`'s cache)	FAIL — CUDA illegal access
`[False-inductor]` (fresh cache)	PASS
`[True-inductor]` (loads `[False-inductor]`'s cache)	FAIL — CUDA illegal access
Either, with `VLLM_DISABLE_COMPILE_CACHE=1`	PASS

Sibling tests (test_toy_llama, test_multi_graph_piecewise_compile, test_simple_inductor_graph_partition) already disable the compile cache via monkeypatch.setenv("VLLM_DISABLE_COMPILE_CACHE", "1"). test_simple_piecewise_compile is the only spawn-mode inductor-backend test that doesn't.

📝 History of failing test

Both regressions were introduced by #41423 (ee38750a7, merged 2026-05-06):

The PR rewrote spawn_new_process_for_each_test to use cloudpickle + python -c child_script, replacing the prior python -m {module_name} approach. The prior approach was effectively a no-op (the subprocess only imported the module — the test never ran), which silently masked these test bodies. After #41423 the tests actually run, exposing the singleton-identity bug and the AOT cache collision.
Pre-#41423 the AOT cache collision in test_simple_piecewise_compile was already latent but never triggered because the spawn helper didn't run the test body in the first place.

A separate but complementary regression (mp.set_start_method("spawn") for XPU/ROCm) is being addressed in #41895.

Fix

PR #41953 fixes both:

Replace cloudpickle-by-value of f with by-name resolution in the child (send f.__module__ + f.__qualname__, look up via importlib.import_module + getattr; use a VLLM_TEST_SPAWN_CHILD env var so the wrapper short-circuits when the child re-resolves itself). Args/kwargs are still cloudpickled. This preserves singleton identity for every module-level global the test touches.
Add monkeypatch.setenv("VLLM_DISABLE_COMPILE_CACHE", "1") to test_simple_piecewise_compile, matching its sibling tests.

The underlying vLLM AOT-cache key is too coarse for any model whose forward branches on an __init__ arg (production-side risk), but tightening it is non-trivial and out of scope for this fix.

CC List

@dzhengAP @youkaichao

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [CI Failure]: PyTorch Fullgraph Smoke Test — counter assertions and AOT cache collision in tests/compile/fullgraph/

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

Fix

CC List

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [CI Failure]: PyTorch Fullgraph Smoke Test — counter assertions and AOT cache collision in tests/compile/fullgraph/

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

Fix

CC List

Still need to ship something?

RELATED_DISCOVERY

TRENDING