vllm - 💡(How to fix) Fix [CI Failure]: PyTorch Fullgraph Smoke Test — counter assertions and AOT cache collision in tests/compile/fullgraph/

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/counter.py", line 51, in expect assert getattr(self, k) - getattr(old, k) == v, ( AssertionError: num_graphs_seen not as expected, before it is 0, after it is 0, expected diff is 1

Root Cause

  • Can reproduce locally
  • Flaky test
  • Caused by external libraries

Fix Action

Fix

PR #41953 fixes both:

  1. Replace cloudpickle-by-value of f with by-name resolution in the child (send f.__module__ + f.__qualname__, look up via importlib.import_module + getattr; use a VLLM_TEST_SPAWN_CHILD env var so the wrapper short-circuits when the child re-resolves itself). Args/kwargs are still cloudpickled. This preserves singleton identity for every module-level global the test touches.
  2. Add monkeypatch.setenv("VLLM_DISABLE_COMPILE_CACHE", "1") to test_simple_piecewise_compile, matching its sibling tests.

The underlying vLLM AOT-cache key is too coarse for any model whose forward branches on an __init__ arg (production-side risk), but tightening it is non-trivial and out of scope for this fix.

Code Example

tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[eager-False]
tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[inductor-False]
tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[inductor-True]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[True-False]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[True-True]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[False-False]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[False-True]
tests/compile/fullgraph/test_simple.py::test_simple_piecewise_compile[False-inductor]

---

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/counter.py", line 51, in expect
    assert getattr(self, k) - getattr(old, k) == v, (
AssertionError: num_graphs_seen not as expected, before it is 0, after it is 0, expected diff is 1

---

[10:24:38.514559] coredump: Detected an exception of type CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS (14)
[10:24:38.514906] coredump:   #0	0x7fd8132723e0	triton_poi_fused_add_0
RAW_BUFFERClick to expand / collapse

Name of failing test

tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[eager-False]
tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[inductor-False]
tests/compile/fullgraph/test_toy_llama.py::test_toy_llama[inductor-True]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[True-False]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[True-True]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[False-False]
tests/compile/fullgraph/test_multiple_graphs.py::test_multi_graph_piecewise_compile[False-True]
tests/compile/fullgraph/test_simple.py::test_simple_piecewise_compile[False-inductor]

Basic information

  • Can reproduce locally
  • Flaky test
  • Caused by external libraries

🧪 Describe the failing test

The "PyTorch Fullgraph Smoke Test" Buildkite step fails with two distinct error patterns (CI build #64888).

Pattern 1 — counter assertion (7 of 8):

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/counter.py", line 51, in expect
    assert getattr(self, k) - getattr(old, k) == v, (
AssertionError: num_graphs_seen not as expected, before it is 0, after it is 0, expected diff is 1

The test sees a compilation_counter whose num_graphs_seen stays at 0 even after torch.compile runs and VllmBackend.__call__ increments it.

Root cause: tests/utils.py:spawn_new_process_for_each_test cloudpickles the inner test function f and runs it in a child interpreter via python -c. The decorator hides f behind the wrapper (module.test_foo resolves to the wrapper, not f), so cloudpickle cannot pickle by reference and falls back to by-value pickling — which serializes f.__globals__. Module-level singletons in that dict (notably vllm.compilation.counter.compilation_counter) get pickled as NEWOBJ + state and reconstructed as fresh clones in the child. Production code (VllmBackend.__call__) increments the real singleton; the test's reference is the stale clone, so the diff is always 0.

Other module-level singletons (vllm.lora.resolver.LoRAResolverRegistry, vllm.tokenizers.registry.TokenizerRegistry) are at risk for the same reason.

Pattern 2 — CUDA illegal access (1 of 8):

[10:24:38.514559] coredump: Detected an exception of type CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS (14)
[10:24:38.514906] coredump:   #0	0x7fd8132723e0	triton_poi_fused_add_0

test_simple_piecewise_compile parametrizes over intermediate_unbacked (True/False), which gates a control-flow branch in SillyModel.forward. The AOT-compile cache key (vllm/compilation/decorators.py:_model_hash_key) only hashes vllm.__version__ + fn.__qualname__ + fn.__code__.co_firstlineno — per-instance attributes aren't included. Both parametrize variants share the same cache slot but Dynamo traces them differently. Whichever variant runs first persists its compiled Triton kernel; the other loads the stale artifact and crashes with an illegal memory access.

Demonstrated locally:

OrderResult
[True-inductor] (fresh cache)PASS
[False-inductor] (loads [True-inductor]'s cache)FAIL — CUDA illegal access
[False-inductor] (fresh cache)PASS
[True-inductor] (loads [False-inductor]'s cache)FAIL — CUDA illegal access
Either, with VLLM_DISABLE_COMPILE_CACHE=1PASS

Sibling tests (test_toy_llama, test_multi_graph_piecewise_compile, test_simple_inductor_graph_partition) already disable the compile cache via monkeypatch.setenv("VLLM_DISABLE_COMPILE_CACHE", "1"). test_simple_piecewise_compile is the only spawn-mode inductor-backend test that doesn't.

📝 History of failing test

Both regressions were introduced by #41423 (ee38750a7, merged 2026-05-06):

  • The PR rewrote spawn_new_process_for_each_test to use cloudpickle + python -c child_script, replacing the prior python -m {module_name} approach. The prior approach was effectively a no-op (the subprocess only imported the module — the test never ran), which silently masked these test bodies. After #41423 the tests actually run, exposing the singleton-identity bug and the AOT cache collision.
  • Pre-#41423 the AOT cache collision in test_simple_piecewise_compile was already latent but never triggered because the spawn helper didn't run the test body in the first place.

A separate but complementary regression (mp.set_start_method("spawn") for XPU/ROCm) is being addressed in #41895.

Fix

PR #41953 fixes both:

  1. Replace cloudpickle-by-value of f with by-name resolution in the child (send f.__module__ + f.__qualname__, look up via importlib.import_module + getattr; use a VLLM_TEST_SPAWN_CHILD env var so the wrapper short-circuits when the child re-resolves itself). Args/kwargs are still cloudpickled. This preserves singleton identity for every module-level global the test touches.
  2. Add monkeypatch.setenv("VLLM_DISABLE_COMPILE_CACHE", "1") to test_simple_piecewise_compile, matching its sibling tests.

The underlying vLLM AOT-cache key is too coarse for any model whose forward branches on an __init__ arg (production-side risk), but tightening it is non-trivial and out of scope for this fix.

CC List

@dzhengAP @youkaichao

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [CI Failure]: PyTorch Fullgraph Smoke Test — counter assertions and AOT cache collision in tests/compile/fullgraph/