pytorch - 💡(How to fix) Fix [vllm] [triton 3.7] AOTAutogradCache.save: _pickle.PicklingError on triton launcher function

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Under torch 2.12.0 with triton 3.7.0, AOTAutogradCache.save blows up because a triton-generated launcher function cannot be pickled:

_pickle.PicklingError: Can't pickle <function launcher at 0x...>: attribute lookup launcher on __main__ failed

Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Error Message

File ".../torch/_functorch/_aot_autograd/graph_compile.py", line 466, in aot_stage2_inference entry = _cache_inference_info( File ".../torch/_functorch/_aot_autograd/graph_compile.py", line 525, in _cache_inference_info AOTAutogradCache.save(...) File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1257, in save AOTAutogradCache._handle_save_error(e, remote, is_bypass=False) File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1229, in _handle_save_error raise e File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in save content = AOTAutogradCache._pickle_entry(entry, remote) File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1195, in _pickle_entry return pickle.dumps(entry) _pickle.PicklingError: Can't pickle <function launcher at 0x7f3605c8f880>: attribute lookup launcher on main failed

Root Cause

Under torch 2.12.0 with triton 3.7.0, AOTAutogradCache.save blows up because a triton-generated launcher function cannot be pickled:

Code Example

File ".../torch/_functorch/_aot_autograd/graph_compile.py", line 466, in aot_stage2_inference
    entry = _cache_inference_info(
File ".../torch/_functorch/_aot_autograd/graph_compile.py", line 525, in _cache_inference_info
    AOTAutogradCache.save(...)
File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1257, in save
    AOTAutogradCache._handle_save_error(e, remote, is_bypass=False)
File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1229, in _handle_save_error
    raise e
File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in save
    content = AOTAutogradCache._pickle_entry(entry, remote)
File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1195, in _pickle_entry
    return pickle.dumps(entry)
_pickle.PicklingError: Can't pickle <function launcher at 0x7f3605c8f880>: attribute lookup launcher on __main__ failed
RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0 with triton 3.7.0, AOTAutogradCache.save blows up because a triton-generated launcher function cannot be pickled:

_pickle.PicklingError: Can't pickle <function launcher at 0x...>: attribute lookup launcher on __main__ failed

Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

  • torch: 2.12.0+cu130
  • triton: 3.7.0
  • CUDA: 13.0
  • Python: 3.12
  • GPU: H200

Traceback

File ".../torch/_functorch/_aot_autograd/graph_compile.py", line 466, in aot_stage2_inference
    entry = _cache_inference_info(
File ".../torch/_functorch/_aot_autograd/graph_compile.py", line 525, in _cache_inference_info
    AOTAutogradCache.save(...)
File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1257, in save
    AOTAutogradCache._handle_save_error(e, remote, is_bypass=False)
File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1229, in _handle_save_error
    raise e
File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1237, in save
    content = AOTAutogradCache._pickle_entry(entry, remote)
File ".../torch/_functorch/_aot_autograd/autograd_cache.py", line 1195, in _pickle_entry
    return pickle.dumps(entry)
_pickle.PicklingError: Can't pickle <function launcher at 0x7f3605c8f880>: attribute lookup launcher on __main__ failed

Question / diagnosis

It looks like a triton-3.7-generated launcher is a local (lambda-like) function defined in __main__, so pickle can't round-trip it. Two possible fixes:

  1. triton: give the generated launcher a stable module/qualname so pickle works, or make ExecutableMetadata/CompiledKernel pickleable via __reduce__.
  2. AOTAutograd: treat pickling errors inside AOT cache save as a bypass (warn + continue) instead of hard-raising, at least for triton-produced artifacts.

Currently _handle_save_error(..., is_bypass=False) re-raises, which kills the compile.

Affected test

  • compile/fullgraph/test_full_graph.py::test_custom_compile_config[compilation_config8-meta-llama/Llama-3.2-1B-Instruct-model_kwargs8]

Links

cc @chauhang @penguinwu @bertmaher @int3 @davidberard98 @nmacchioni @embg @peterbell10 @aakhundov @jataylo @iupaikov-amd @bdhirsh @bobrenjc93 @aorenste

extent analysis

TL;DR

The most likely fix is to modify AOTAutograd to treat pickling errors inside AOT cache save as a bypass for triton-produced artifacts.

Guidance

  • Investigate modifying AOTAutograd to catch and handle _pickle.PicklingError exceptions specifically for triton-generated launchers, allowing the compilation to continue with a warning.
  • Consider adding a check for the type of error and the origin of the launcher function to ensure that only triton-produced artifacts are bypassed.
  • Review the ExecutableMetadata and CompiledKernel classes in triton to see if making them pickleable via __reduce__ is a viable alternative solution.
  • Verify that the proposed fix does not introduce any security vulnerabilities related to pickling and unpickling of arbitrary functions.

Example

try:
    content = AOTAutogradCache._pickle_entry(entry, remote)
except _pickle.PicklingError as e:
    if "attribute lookup launcher on __main__ failed" in str(e):
        # Handle triton-generated launcher pickling error as a bypass
        warnings.warn("Pickling error occurred for triton-generated launcher. Continuing with compilation.")
        # Continue with compilation or provide an alternative solution
    else:
        raise

Notes

The provided solution focuses on handling the pickling error as a bypass within AOTAutograd. However, it's essential to ensure that this fix does not compromise the security or functionality of the system. Further investigation into making triton-generated launchers pickleable or finding an alternative serialization method might be necessary for a more robust solution.

Recommendation

Apply the workaround by modifying AOTAutograd to treat pickling errors for triton-produced artifacts as a bypass, as it seems to be the most direct approach to resolve the immediate issue blocking the torch 2.12 upgrade.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [triton 3.7] AOTAutogradCache.save: _pickle.PicklingError on triton launcher function