pytorch - 💡(How to fix) Fix [vllm] [torch nightly] AOTAutogradCache.save: _pickle.PicklingError on triton launcher (DFlashDraftModel speculative-decode path) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181869Fetched 2026-04-30 06:18:02
View on GitHub
Comments
0
Participants
1
Timeline
87
Reactions
0
Author
Participants
Timeline (top)
mentioned ×39subscribed ×39labeled ×9

On a recent torch nightly snapshot, vLLM's test_can_initialize_large_subset[DFlashDraftModel] fails during engine initialization because AOTAutogradCache.save cannot pickle a triton-generated launcher function:

_pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>: attribute lookup launcher on __main__ failed

The failure is on the speculative-decode DFlashDraftModel path (Qwen3-DFlash). The remaining 182/183 model-initialization tests in the same job pass.

Error Message

File "vllm/v1/engine/core.py", line 250, in _initialize_kv_caches available_gpu_memory = self.model_executor.determine_available_memory() File "vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory self.model_runner.profile_run() File "vllm/v1/worker/gpu_model_runner.py", line 5587, in _dummy_run self.drafter.dummy_run(...) File "vllm/v1/spec_decode/dflash.py", line 246, in dummy_run self.model(...) File "vllm/model_executor/models/qwen3_dflash.py", line 549, in forward return self.model(input_ids, positions, inputs_embeds) File "vllm/compilation/decorators.py", line 623, in call self.aot_compiled_fn = self.aot_compile(*args, **kwargs) File "vllm/compilation/wrapper.py", line 183, in aot_compile return self._compiled_callable.aot_compile((args, kwargs)) File "torch/_dynamo/eval_frame.py", line 868, in aot_compile return aot_compile_fullgraph(...) File "torch/_dynamo/aot_compile.py", line 387, in aot_compile_fullgraph compiled_fn = backend(...) ... File "torch/_inductor/standalone_compile.py", line 462, in standalone_compile compiled_fn = compile_fx(...) File "torch/_inductor/compile_fx.py", line 3022, in _compile_fx_main return dynamo_common.aot_autograd(...) File "torch/_functorch/aot_autograd.py", line 1234, in aot_module_simplified compiled_fn, _ = aot_stage2_compile(...) File "torch/_functorch/_aot_autograd/graph_compile.py", line 482, in aot_stage2_inference entry = _cache_inference_info(...) File "torch/_functorch/_aot_autograd/graph_compile.py", line 536, in _cache_inference_info AOTAutogradCache.save(...) File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1299, in save AOTAutogradCache._handle_save_error(e, remote, is_bypass=False) File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1271, in _handle_save_error raise e File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1279, in save content = AOTAutogradCache._pickle_entry(entry, remote) File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in _pickle_entry return pickle.dumps(entry) _pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>: attribute lookup launcher on main failed

Root Cause

On a recent torch nightly snapshot, vLLM's test_can_initialize_large_subset[DFlashDraftModel] fails during engine initialization because AOTAutogradCache.save cannot pickle a triton-generated launcher function:

Fix Action

Fix / Workaround

  • The closing commit fcea8749c22f (PR pytorch/pytorch#181463) is a bypass workaround: it catches PicklingError from triton kernels in AOTAutogradCache.save and converts the failure to a warning + continue (cache becomes a no-op for that entry).
  • Effect when the fix reaches the wheel: this hard error should become:
    W AOTAutograd cache unable to serialize compiled graph:
      Can't pickle <function launcher at 0x...>: attribute lookup launcher on __main__ failed
    and the test should pass.

Code Example

tests/models/test_initialization.py::test_can_initialize_large_subset[DFlashDraftModel]

---

File "vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
  available_gpu_memory = self.model_executor.determine_available_memory()
File "vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
  self.model_runner.profile_run()
File "vllm/v1/worker/gpu_model_runner.py", line 5587, in _dummy_run
  self.drafter.dummy_run(...)
File "vllm/v1/spec_decode/dflash.py", line 246, in dummy_run
  self.model(...)
File "vllm/model_executor/models/qwen3_dflash.py", line 549, in forward
  return self.model(input_ids, positions, inputs_embeds)
File "vllm/compilation/decorators.py", line 623, in __call__
  self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
File "vllm/compilation/wrapper.py", line 183, in aot_compile
  return self._compiled_callable.aot_compile((args, kwargs))
File "torch/_dynamo/eval_frame.py", line 868, in aot_compile
  return aot_compile_fullgraph(...)
File "torch/_dynamo/aot_compile.py", line 387, in aot_compile_fullgraph
  compiled_fn = backend(...)
...
File "torch/_inductor/standalone_compile.py", line 462, in standalone_compile
  compiled_fn = compile_fx(...)
File "torch/_inductor/compile_fx.py", line 3022, in _compile_fx_main
  return dynamo_common.aot_autograd(...)
File "torch/_functorch/aot_autograd.py", line 1234, in aot_module_simplified
  compiled_fn, _ = aot_stage2_compile(...)
File "torch/_functorch/_aot_autograd/graph_compile.py", line 482, in aot_stage2_inference
  entry = _cache_inference_info(...)
File "torch/_functorch/_aot_autograd/graph_compile.py", line 536, in _cache_inference_info
  AOTAutogradCache.save(...)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1299, in save
  AOTAutogradCache._handle_save_error(e, remote, is_bypass=False)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1271, in _handle_save_error
  raise e
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1279, in save
  content = AOTAutogradCache._pickle_entry(entry, remote)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in _pickle_entry
  return pickle.dumps(entry)
_pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>:
                       attribute lookup launcher on __main__ failed

---

torch==2.13.0.dev20260428+cu130
torchvision==0.27.0.dev20260428+cu130
torchaudio==2.11.0.dev20260428+cu130

---

W AOTAutograd cache unable to serialize compiled graph:
    Can't pickle <function launcher at 0x...>: attribute lookup launcher on __main__ failed
RAW_BUFFERClick to expand / collapse

Summary

On a recent torch nightly snapshot, vLLM's test_can_initialize_large_subset[DFlashDraftModel] fails during engine initialization because AOTAutogradCache.save cannot pickle a triton-generated launcher function:

_pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>: attribute lookup launcher on __main__ failed

The failure is on the speculative-decode DFlashDraftModel path (Qwen3-DFlash). The remaining 182/183 model-initialization tests in the same job pass.

Environment

  • torch: nightly (vLLM build v0.20.1rc1.dev48+g8a8c9b564.d20260429)
  • CUDA: 13.0 / Driver: 570.133.20
  • Python: 3.12.13
  • Platform: Ubuntu 22.04 x86_64
  • vLLM main, commit 8a8c9b564e

Reproduction

Failing test:

tests/models/test_initialization.py::test_can_initialize_large_subset[DFlashDraftModel]

Pytest summary: 1 failed, 182 passed, 5 deselected, 200 warnings in 3522.34s

The test creates a vLLM LLM instance with DFlashDraftModel as the spec-decode draft. During the dummy-run profiling, vLLM compiles qwen3_dflash via torch.compile / AOTAutograd, and the cache-save step on the resulting AOTAutograd entry trips the pickling error.

Traceback (key frames, abridged)

File "vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
  available_gpu_memory = self.model_executor.determine_available_memory()
File "vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
  self.model_runner.profile_run()
File "vllm/v1/worker/gpu_model_runner.py", line 5587, in _dummy_run
  self.drafter.dummy_run(...)
File "vllm/v1/spec_decode/dflash.py", line 246, in dummy_run
  self.model(...)
File "vllm/model_executor/models/qwen3_dflash.py", line 549, in forward
  return self.model(input_ids, positions, inputs_embeds)
File "vllm/compilation/decorators.py", line 623, in __call__
  self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
File "vllm/compilation/wrapper.py", line 183, in aot_compile
  return self._compiled_callable.aot_compile((args, kwargs))
File "torch/_dynamo/eval_frame.py", line 868, in aot_compile
  return aot_compile_fullgraph(...)
File "torch/_dynamo/aot_compile.py", line 387, in aot_compile_fullgraph
  compiled_fn = backend(...)
...
File "torch/_inductor/standalone_compile.py", line 462, in standalone_compile
  compiled_fn = compile_fx(...)
File "torch/_inductor/compile_fx.py", line 3022, in _compile_fx_main
  return dynamo_common.aot_autograd(...)
File "torch/_functorch/aot_autograd.py", line 1234, in aot_module_simplified
  compiled_fn, _ = aot_stage2_compile(...)
File "torch/_functorch/_aot_autograd/graph_compile.py", line 482, in aot_stage2_inference
  entry = _cache_inference_info(...)
File "torch/_functorch/_aot_autograd/graph_compile.py", line 536, in _cache_inference_info
  AOTAutogradCache.save(...)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1299, in save
  AOTAutogradCache._handle_save_error(e, remote, is_bypass=False)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1271, in _handle_save_error
  raise e
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1279, in save
  content = AOTAutogradCache._pickle_entry(entry, remote)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in _pickle_entry
  return pickle.dumps(entry)
_pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>:
                       attribute lookup launcher on __main__ failed

Reproducibility

Related

A similar _pickle.PicklingError on triton launcher was filed against the torch 2.12 test wheel in pytorch/pytorch#180911 (now closed). That issue never reproduced in the 2.12 test-channel CI runs. This nightly failure has the same exception family but is exercised through the DFlashDraftModel speculative-decode path specifically — possibly a code path the earlier fix didn't cover, or a regression in nightly post-fix.

Diagnosis request

The cached AOTAutograd entry contains a triton-generated launcher function defined in a child __main__-scoped JIT module that pickle can't reach. Could a maintainer:

  1. Confirm whether the launcher codegen now produces a non-pickleable function by default on nightly (regression vs. release branch).
  2. Check whether the AOTAutogradCache save path should be defensively bypassing entries that contain triton functions, or whether the launcher should be made pickleable.

Links

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo @bdhirsh @bobrenjc93 @aorenste


Update: bisect + bundled-torch-nightly evidence

Bisect across vLLM main nightly Full CI runs

BuildDate (UTC)vLLM commitTest result
612102026-04-14 12:34(branch run)✅ pass (user-reported, pre-regression)
615752026-04-16 06:007845379230❌ infra (image manifest 404 — not a real test signal)
617842026-04-17 06:00bf45e6d0a5❌ PicklingError (first confirmed real failure)
619962026-04-19 06:004353c9cb4a❌ PicklingError
624562026-04-22 06:00aad88f8486❌ PicklingError
629492026-04-25 06:0095995bbef8❌ PicklingError
630612026-04-27 06:002cc008e7b4❌ PicklingError
632462026-04-28 06:00ed57f77192❌ PicklingError (image built before torch fix landed at 15:33 UTC)
633792026-04-28 21:00e9f8f31e9a❌ PicklingError (image built ~6h after torch fix UTC, but nightly wheel still pre-fix)
634972026-04-29 06:008a8c9b564e❌ PicklingError (this report)

The PicklingError reliably reproduces from 2026-04-17 onwards. The regression itself was introduced in the torch nightly built between 2026-04-14 and 2026-04-17.

Confirmed bundled torch nightly version in 63497

By inspecting the postmerge docker image config (ECR public manifest → image config blob → history), and cross-referencing the :docker: build image torch nightly job log:

torch==2.13.0.dev20260428+cu130
torchvision==0.27.0.dev20260428+cu130
torchaudio==2.11.0.dev20260428+cu130

Image build timeline:

EventUTC timestamp
PyTorch nightly wheel dev20260428 cut from main HEAD≈ 2026-04-28 00:00 (typical nightly cut time)
Fix commit fcea8749c22f (pytorch/pytorch#180911) merged to main2026-04-28 15:33:57
Docker image-build for vLLM 8a8c9b564e (uv pip install ... --pre --index-url ...nightly/cu130)2026-04-29 06:18:46
vLLM build 63497 test execution2026-04-29 06:00–07:42

The wheel dev20260428 was cut ~15 hours before fcea8749c22f landed. At image-build time (April 29 06:18 UTC) the April 29 nightly hadn't published yet, so dev20260428 was the most recent available — and it predates the fix.

Relationship to pytorch/pytorch#180911 (closed)

  • The closing commit fcea8749c22f (PR pytorch/pytorch#181463) is a bypass workaround: it catches PicklingError from triton kernels in AOTAutogradCache.save and converts the failure to a warning + continue (cache becomes a no-op for that entry).
  • Effect when the fix reaches the wheel: this hard error should become:
    W AOTAutograd cache unable to serialize compiled graph:
      Can't pickle <function launcher at 0x...>: attribute lookup launcher on __main__ failed
    and the test should pass.

Expected fix arrival in vLLM CI

The first PyTorch nightly that should include fcea8749c22f is 2.13.0.dev20260429+cu130 (cut from April 29 main HEAD). Once that wheel publishes and the next vLLM postmerge :docker: build image torch nightly job picks it up, this failure should convert from fatal to a soft warning.

Real (non-bypass) fix

The bypass ships, but the underlying issue is that triton's codegen now produces a launcher function defined in a __main__-scoped JIT module that pickle can't reach. A proper fix would either move the launcher definition into a stable importable module so attribute lookup launcher on __main__ succeeds, or have AOTAutogradCache._pickle_entry exclude triton launcher functions explicitly.

Where the issue lives

  • Torch-side. The exception originates entirely inside torch/_functorch/_aot_autograd/autograd_cache.py. vLLM's only role is invoking torch.compile(..., fullgraph=True) on qwen3_dflash, which produces a graph that contains triton-launched kernels — a normal supported usage. No vLLM code change is needed.

extent analysis

TL;DR

The most likely fix for the _pickle.PicklingError issue is to wait for the PyTorch nightly wheel to include the fix commit fcea8749c22f, which bypasses the pickling error for triton-generated launcher functions.

Guidance

  1. Verify the PyTorch nightly version: Ensure that the PyTorch nightly version used in the vLLM CI includes the fix commit fcea8749c22f.
  2. Check the vLLM CI job logs: Monitor the vLLM CI job logs to see if the failure converts to a soft warning after the fix is included in the PyTorch nightly wheel.
  3. Test with the updated PyTorch nightly wheel: Once the updated PyTorch nightly wheel is available, test the vLLM build with the new wheel to confirm that the issue is resolved.
  4. Investigate a real fix: While the bypass fix resolves the immediate issue, investigate a real fix that addresses the underlying problem of triton-generated launcher functions being non-pickleable.

Notes

  • The issue is specific to the PyTorch nightly version and the vLLM build that uses it.
  • The fix commit fcea8749c22f is a bypass workaround that converts the hard error to a soft warning.
  • A real fix would require changes to the PyTorch code to make the triton-generated launcher functions pickleable or to exclude them from the AOTAutograd cache.

Recommendation

Apply the workaround by waiting for the PyTorch nightly wheel to include the fix commit fcea8749c22f, as it is the most straightforward solution to resolve the immediate issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [torch nightly] AOTAutogradCache.save: _pickle.PicklingError on triton launcher (DFlashDraftModel speculative-decode path) [1 participants]