pytorch - 💡(How to fix) Fix [vllm] [torch nightly] AOTAutogradCache.save: _pickle.PicklingError on triton launcher (DFlashDraftModel speculative-decode path) [1 participants]

pytorch2026-04-29 13:47:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#181869•Fetched 2026-04-30 06:18:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

atalman

Participants

atalman

Timeline (top)

mentioned ×39subscribed ×39labeled ×9

On a recent torch nightly snapshot, vLLM's test_can_initialize_large_subset[DFlashDraftModel] fails during engine initialization because AOTAutogradCache.save cannot pickle a triton-generated launcher function:

_pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>: attribute lookup launcher on __main__ failed

The failure is on the speculative-decode DFlashDraftModel path (Qwen3-DFlash). The remaining 182/183 model-initialization tests in the same job pass.

Error Message

File "vllm/v1/engine/core.py", line 250, in _initialize_kv_caches available_gpu_memory = self.model_executor.determine_available_memory() File "vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory self.model_runner.profile_run() File "vllm/v1/worker/gpu_model_runner.py", line 5587, in _dummy_run self.drafter.dummy_run(...) File "vllm/v1/spec_decode/dflash.py", line 246, in dummy_run self.model(...) File "vllm/model_executor/models/qwen3_dflash.py", line 549, in forward return self.model(input_ids, positions, inputs_embeds) File "vllm/compilation/decorators.py", line 623, in call self.aot_compiled_fn = self.aot_compile(*args, **kwargs) File "vllm/compilation/wrapper.py", line 183, in aot_compile return self._compiled_callable.aot_compile((args, kwargs)) File "torch/_dynamo/eval_frame.py", line 868, in aot_compile return aot_compile_fullgraph(...) File "torch/_dynamo/aot_compile.py", line 387, in aot_compile_fullgraph compiled_fn = backend(...) ... File "torch/_inductor/standalone_compile.py", line 462, in standalone_compile compiled_fn = compile_fx(...) File "torch/_inductor/compile_fx.py", line 3022, in _compile_fx_main return dynamo_common.aot_autograd(...) File "torch/_functorch/aot_autograd.py", line 1234, in aot_module_simplified compiled_fn, _ = aot_stage2_compile(...) File "torch/_functorch/_aot_autograd/graph_compile.py", line 482, in aot_stage2_inference entry = _cache_inference_info(...) File "torch/_functorch/_aot_autograd/graph_compile.py", line 536, in _cache_inference_info AOTAutogradCache.save(...) File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1299, in save AOTAutogradCache._handle_save_error(e, remote, is_bypass=False) File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1271, in _handle_save_error raise e File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1279, in save content = AOTAutogradCache._pickle_entry(entry, remote) File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in _pickle_entry return pickle.dumps(entry) _pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>: attribute lookup launcher on main failed

Root Cause

Fix Action

Fix / Workaround

The closing commit fcea8749c22f (PR pytorch/pytorch#181463) is a bypass workaround: it catches PicklingError from triton kernels in AOTAutogradCache.save and converts the failure to a warning + continue (cache becomes a no-op for that entry).

Effect when the fix reaches the wheel: this hard error should become:

W AOTAutograd cache unable to serialize compiled graph:
  Can't pickle <function launcher at 0x...>: attribute lookup launcher on __main__ failed

and the test should pass.

Code Example

tests/models/test_initialization.py::test_can_initialize_large_subset[DFlashDraftModel]

---

File "vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
  available_gpu_memory = self.model_executor.determine_available_memory()
File "vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
  self.model_runner.profile_run()
File "vllm/v1/worker/gpu_model_runner.py", line 5587, in _dummy_run
  self.drafter.dummy_run(...)
File "vllm/v1/spec_decode/dflash.py", line 246, in dummy_run
  self.model(...)
File "vllm/model_executor/models/qwen3_dflash.py", line 549, in forward
  return self.model(input_ids, positions, inputs_embeds)
File "vllm/compilation/decorators.py", line 623, in __call__
  self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
File "vllm/compilation/wrapper.py", line 183, in aot_compile
  return self._compiled_callable.aot_compile((args, kwargs))
File "torch/_dynamo/eval_frame.py", line 868, in aot_compile
  return aot_compile_fullgraph(...)
File "torch/_dynamo/aot_compile.py", line 387, in aot_compile_fullgraph
  compiled_fn = backend(...)
...
File "torch/_inductor/standalone_compile.py", line 462, in standalone_compile
  compiled_fn = compile_fx(...)
File "torch/_inductor/compile_fx.py", line 3022, in _compile_fx_main
  return dynamo_common.aot_autograd(...)
File "torch/_functorch/aot_autograd.py", line 1234, in aot_module_simplified
  compiled_fn, _ = aot_stage2_compile(...)
File "torch/_functorch/_aot_autograd/graph_compile.py", line 482, in aot_stage2_inference
  entry = _cache_inference_info(...)
File "torch/_functorch/_aot_autograd/graph_compile.py", line 536, in _cache_inference_info
  AOTAutogradCache.save(...)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1299, in save
  AOTAutogradCache._handle_save_error(e, remote, is_bypass=False)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1271, in _handle_save_error
  raise e
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1279, in save
  content = AOTAutogradCache._pickle_entry(entry, remote)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in _pickle_entry
  return pickle.dumps(entry)
_pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>:
                       attribute lookup launcher on __main__ failed

---

torch==2.13.0.dev20260428+cu130
torchvision==0.27.0.dev20260428+cu130
torchaudio==2.11.0.dev20260428+cu130

---

W AOTAutograd cache unable to serialize compiled graph:
    Can't pickle <function launcher at 0x...>: attribute lookup launcher on __main__ failed

RAW_BUFFERClick to expand / collapse

Summary

_pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>: attribute lookup launcher on __main__ failed

The failure is on the speculative-decode DFlashDraftModel path (Qwen3-DFlash). The remaining 182/183 model-initialization tests in the same job pass.

Environment

torch: nightly (vLLM build v0.20.1rc1.dev48+g8a8c9b564.d20260429)
CUDA: 13.0 / Driver: 570.133.20
Python: 3.12.13
Platform: Ubuntu 22.04 x86_64
vLLM main, commit 8a8c9b564e

Reproduction

Failing test:

tests/models/test_initialization.py::test_can_initialize_large_subset[DFlashDraftModel]

Pytest summary: 1 failed, 182 passed, 5 deselected, 200 warnings in 3522.34s

The test creates a vLLM LLM instance with DFlashDraftModel as the spec-decode draft. During the dummy-run profiling, vLLM compiles qwen3_dflash via torch.compile / AOTAutograd, and the cache-save step on the resulting AOTAutograd entry trips the pickling error.

Traceback (key frames, abridged)

File "vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
  available_gpu_memory = self.model_executor.determine_available_memory()
File "vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
  self.model_runner.profile_run()
File "vllm/v1/worker/gpu_model_runner.py", line 5587, in _dummy_run
  self.drafter.dummy_run(...)
File "vllm/v1/spec_decode/dflash.py", line 246, in dummy_run
  self.model(...)
File "vllm/model_executor/models/qwen3_dflash.py", line 549, in forward
  return self.model(input_ids, positions, inputs_embeds)
File "vllm/compilation/decorators.py", line 623, in __call__
  self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
File "vllm/compilation/wrapper.py", line 183, in aot_compile
  return self._compiled_callable.aot_compile((args, kwargs))
File "torch/_dynamo/eval_frame.py", line 868, in aot_compile
  return aot_compile_fullgraph(...)
File "torch/_dynamo/aot_compile.py", line 387, in aot_compile_fullgraph
  compiled_fn = backend(...)
...
File "torch/_inductor/standalone_compile.py", line 462, in standalone_compile
  compiled_fn = compile_fx(...)
File "torch/_inductor/compile_fx.py", line 3022, in _compile_fx_main
  return dynamo_common.aot_autograd(...)
File "torch/_functorch/aot_autograd.py", line 1234, in aot_module_simplified
  compiled_fn, _ = aot_stage2_compile(...)
File "torch/_functorch/_aot_autograd/graph_compile.py", line 482, in aot_stage2_inference
  entry = _cache_inference_info(...)
File "torch/_functorch/_aot_autograd/graph_compile.py", line 536, in _cache_inference_info
  AOTAutogradCache.save(...)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1299, in save
  AOTAutogradCache._handle_save_error(e, remote, is_bypass=False)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1271, in _handle_save_error
  raise e
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1279, in save
  content = AOTAutogradCache._pickle_entry(entry, remote)
File "torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in _pickle_entry
  return pickle.dumps(entry)
_pickle.PicklingError: Can't pickle <function launcher at 0x7fc56cd2aca0>:
                       attribute lookup launcher on __main__ failed

Reproducibility

Buildkite: https://buildkite.com/vllm/ci/builds/63497#019dd7d4-1952-41f4-826c-0b8dfe51be22
Job: Torch Nightly Basic Models Tests (Extra Initialization) 1 (soft-failed; build itself is passed because the job is non-blocking)
vLLM main commit: 8a8c9b564e (2026-04-29 nightly)

A similar _pickle.PicklingError on triton launcher was filed against the torch 2.12 test wheel in pytorch/pytorch#180911 (now closed). That issue never reproduced in the 2.12 test-channel CI runs. This nightly failure has the same exception family but is exercised through the DFlashDraftModel speculative-decode path specifically — possibly a code path the earlier fix didn't cover, or a regression in nightly post-fix.

Diagnosis request

The cached AOTAutograd entry contains a triton-generated launcher function defined in a child __main__-scoped JIT module that pickle can't reach. Could a maintainer:

Confirm whether the launcher codegen now produces a non-pickleable function by default on nightly (regression vs. release branch).
Check whether the AOTAutogradCache save path should be defensively bypassing entries that contain triton functions, or whether the launcher should be made pickleable.

Links

vLLM build: https://buildkite.com/vllm/ci/builds/63497
Job: https://buildkite.com/vllm/ci/builds/63497#019dd7d4-1952-41f4-826c-0b8dfe51be22
Related closed issue: pytorch/pytorch#180911
First Failure April 16: https://buildkite.com/vllm/ci/builds/61575
Success on April 14: https://buildkite.com/vllm/ci/builds/61210

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo @bdhirsh @bobrenjc93 @aorenste

Update: bisect + bundled-torch-nightly evidence

Bisect across vLLM main nightly Full CI runs

Build	Date (UTC)	vLLM commit	Test result
61210	2026-04-14 12:34	(branch run)	✅ pass (user-reported, pre-regression)
61575	2026-04-16 06:00	7845379230	❌ infra (image manifest 404 — not a real test signal)
61784	2026-04-17 06:00	bf45e6d0a5	❌ PicklingError (first confirmed real failure)
61996	2026-04-19 06:00	4353c9cb4a	❌ PicklingError
62456	2026-04-22 06:00	aad88f8486	❌ PicklingError
62949	2026-04-25 06:00	95995bbef8	❌ PicklingError
63061	2026-04-27 06:00	2cc008e7b4	❌ PicklingError
63246	2026-04-28 06:00	ed57f77192	❌ PicklingError (image built before torch fix landed at 15:33 UTC)
63379	2026-04-28 21:00	e9f8f31e9a	❌ PicklingError (image built ~6h after torch fix UTC, but nightly wheel still pre-fix)
63497	2026-04-29 06:00	8a8c9b564e	❌ PicklingError (this report)

The PicklingError reliably reproduces from 2026-04-17 onwards. The regression itself was introduced in the torch nightly built between 2026-04-14 and 2026-04-17.

Confirmed bundled torch nightly version in 63497

By inspecting the postmerge docker image config (ECR public manifest → image config blob → history), and cross-referencing the :docker: build image torch nightly job log:

torch==2.13.0.dev20260428+cu130
torchvision==0.27.0.dev20260428+cu130
torchaudio==2.11.0.dev20260428+cu130

Image build timeline:

Event	UTC timestamp
PyTorch nightly wheel `dev20260428` cut from `main` HEAD	≈ 2026-04-28 00:00 (typical nightly cut time)
Fix commit `fcea8749c22f` (pytorch/pytorch#180911) merged to `main`	2026-04-28 15:33:57
Docker image-build for vLLM `8a8c9b564e` (`uv pip install ... --pre --index-url ...nightly/cu130`)	2026-04-29 06:18:46
vLLM build 63497 test execution	2026-04-29 06:00–07:42

The wheel dev20260428 was cut ~15 hours before fcea8749c22f landed. At image-build time (April 29 06:18 UTC) the April 29 nightly hadn't published yet, so dev20260428 was the most recent available — and it predates the fix.

Relationship to pytorch/pytorch#180911 (closed)

The closing commit fcea8749c22f (PR pytorch/pytorch#181463) is a bypass workaround: it catches PicklingError from triton kernels in AOTAutogradCache.save and converts the failure to a warning + continue (cache becomes a no-op for that entry).

Effect when the fix reaches the wheel: this hard error should become:

W AOTAutograd cache unable to serialize compiled graph:
  Can't pickle <function launcher at 0x...>: attribute lookup launcher on __main__ failed

and the test should pass.

Expected fix arrival in vLLM CI

The first PyTorch nightly that should include fcea8749c22f is 2.13.0.dev20260429+cu130 (cut from April 29 main HEAD). Once that wheel publishes and the next vLLM postmerge :docker: build image torch nightly job picks it up, this failure should convert from fatal to a soft warning.

Real (non-bypass) fix

The bypass ships, but the underlying issue is that triton's codegen now produces a launcher function defined in a __main__-scoped JIT module that pickle can't reach. A proper fix would either move the launcher definition into a stable importable module so attribute lookup launcher on __main__ succeeds, or have AOTAutogradCache._pickle_entry exclude triton launcher functions explicitly.

Where the issue lives

Torch-side. The exception originates entirely inside torch/_functorch/_aot_autograd/autograd_cache.py. vLLM's only role is invoking torch.compile(..., fullgraph=True) on qwen3_dflash, which produces a graph that contains triton-launched kernels — a normal supported usage. No vLLM code change is needed.

extent analysis

TL;DR

The most likely fix for the _pickle.PicklingError issue is to wait for the PyTorch nightly wheel to include the fix commit fcea8749c22f, which bypasses the pickling error for triton-generated launcher functions.

Guidance

Verify the PyTorch nightly version: Ensure that the PyTorch nightly version used in the vLLM CI includes the fix commit fcea8749c22f.
Check the vLLM CI job logs: Monitor the vLLM CI job logs to see if the failure converts to a soft warning after the fix is included in the PyTorch nightly wheel.
Test with the updated PyTorch nightly wheel: Once the updated PyTorch nightly wheel is available, test the vLLM build with the new wheel to confirm that the issue is resolved.
Investigate a real fix: While the bypass fix resolves the immediate issue, investigate a real fix that addresses the underlying problem of triton-generated launcher functions being non-pickleable.

Notes

The issue is specific to the PyTorch nightly version and the vLLM build that uses it.
The fix commit fcea8749c22f is a bypass workaround that converts the hard error to a soft warning.
A real fix would require changes to the PyTorch code to make the triton-generated launcher functions pickleable or to exclude them from the AOTAutograd cache.

Recommendation

Apply the workaround by waiting for the PyTorch nightly wheel to include the fix commit fcea8749c22f, as it is the most straightforward solution to resolve the immediate issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#training loop #device allocation #model download #tokenizer error #prompt formatting

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.