pytorch - 💡(How to fix) Fix [vllm] [2.12 regression] Sequence Parallel test_tp_sp_generation: RayChannelTimeoutError on tp=2 setups (Llama) [2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181632Fetched 2026-04-28 06:24:19
View on GitHub
Comments
2
Participants
1
Timeline
50
Reactions
0
Author
Participants
Timeline (top)
mentioned ×20subscribed ×20labeled ×5commented ×2

Under torch 2.12.0 + triton 3.7.0, vLLM's test_tp_sp_generation fails on three parametrized configurations because the Ray-backed engine times out waiting for an inter-process object:

ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read.

Three parametrizations fail (tp_size=2, ray distributed backend):

test_tp_sp_generation[False-True-hmellor/tiny-random-LlamaForCausalLM-parallel_setup1-ray-auto-test_options1]
test_tp_sp_generation[False-True-hmellor/tiny-random-LlamaForCausalLM-parallel_setup3-ray-auto-test_options3]
test_tp_sp_generation[False-True-RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-parallel_setup17-ray-auto-test_options17]

The pytest-side error is the wrapper:

AssertionError: function test_tp_sp_generation failed when called with args () and kwargs {'model_id': 'hmellor/tiny-random-LlamaForCausalLM', 'parallel_setup': ParallelSetup(tp_size=2, pp_size=1, ...), 'distributed_backend': 'ray', ...}

Passes on torch 2.11. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Error Message

File "vllm/v1/engine/core.py", line 1129, in run_engine_core engine_core.run_busy_loop() ... File "python/ray/_raylet.pyx", line 3194, in ray._raylet.CoreWorker.get_objects File "python/ray/includes/common.pxi", line 106, in ray._raylet.check_status ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read. ObjectID: 003d5d11e415fe881ac38fca360ef9053beab8b0010000000ae1f505

Root Cause

Under torch 2.12.0 + triton 3.7.0, vLLM's test_tp_sp_generation fails on three parametrized configurations because the Ray-backed engine times out waiting for an inter-process object:

Code Example

test_tp_sp_generation[False-True-hmellor/tiny-random-LlamaForCausalLM-parallel_setup1-ray-auto-test_options1]
test_tp_sp_generation[False-True-hmellor/tiny-random-LlamaForCausalLM-parallel_setup3-ray-auto-test_options3]
test_tp_sp_generation[False-True-RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-parallel_setup17-ray-auto-test_options17]

---

File "vllm/v1/engine/core.py", line 1129, in run_engine_core
  engine_core.run_busy_loop()
...
File "python/ray/_raylet.pyx", line 3194, in ray._raylet.CoreWorker.get_objects
File "python/ray/includes/common.pxi", line 106, in ray._raylet.check_status
ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read.
ObjectID: 003d5d11e415fe881ac38fca360ef9053beab8b0010000000ae1f505
RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0 + triton 3.7.0, vLLM's test_tp_sp_generation fails on three parametrized configurations because the Ray-backed engine times out waiting for an inter-process object:

ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read.

Three parametrizations fail (tp_size=2, ray distributed backend):

test_tp_sp_generation[False-True-hmellor/tiny-random-LlamaForCausalLM-parallel_setup1-ray-auto-test_options1]
test_tp_sp_generation[False-True-hmellor/tiny-random-LlamaForCausalLM-parallel_setup3-ray-auto-test_options3]
test_tp_sp_generation[False-True-RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-parallel_setup17-ray-auto-test_options17]

The pytest-side error is the wrapper:

AssertionError: function test_tp_sp_generation failed when called with args () and kwargs {'model_id': 'hmellor/tiny-random-LlamaForCausalLM', 'parallel_setup': ParallelSetup(tp_size=2, pp_size=1, ...), 'distributed_backend': 'ray', ...}

Passes on torch 2.11. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

  • torch: 2.12.0+cu130 (test channel)
  • triton: 3.7.0
  • CUDA: 13.0
  • Python: 3.12.13
  • GPU: 2× NVIDIA H100 (test name says 2xH100)
  • Distributed backend: ray
  • Models: hmellor/tiny-random-LlamaForCausalLM, RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8

Traceback (abridged)

File "vllm/v1/engine/core.py", line 1129, in run_engine_core
  engine_core.run_busy_loop()
...
File "python/ray/_raylet.pyx", line 3194, in ray._raylet.CoreWorker.get_objects
File "python/ray/includes/common.pxi", line 106, in ray._raylet.check_status
ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read.
ObjectID: 003d5d11e415fe881ac38fca360ef9053beab8b0010000000ae1f505

Reproducibility

Diagnosis request

RayChannelTimeoutError on tp=2 ray-backed engine startup suggests something in the engine's worker-to-worker communication path is now significantly slower or hanging on torch 2.12 — possibly compile/AOT-cache time spent on each worker before they synchronize. Could a maintainer (or the Ray + torch.compile interaction owner) look at whether torch 2.12 introduces additional per-worker compilation latency that pushes Ray's default channel timeout over the limit?

Links

  • vLLM PR: vllm-project/vllm#40077
  • Umbrella: pytorch/pytorch#180899

cc @chauhang @penguinwu @oulgen @jamesjwu @aorenste @anijain2305 @laithsakka @masnesral @coconutruben @aditvenk

extent analysis

TL;DR

The most likely fix is to investigate and potentially adjust the Ray channel timeout due to increased per-worker compilation latency introduced by torch 2.12.

Guidance

  • Review the Ray documentation to understand how to adjust the channel timeout and consider increasing it as a temporary workaround.
  • Investigate the compilation latency introduced by torch 2.12 and its impact on the Ray-backed engine startup.
  • Check the torch 2.12 release notes and documentation for any changes related to compilation or AOT caching that might affect the engine's worker-to-worker communication.
  • Consider testing with different torch versions to isolate the issue and confirm if it's specific to torch 2.12.

Example

No code snippet is provided as the issue seems to be related to configuration and version interactions rather than a specific code error.

Notes

The issue appears to be specific to the combination of torch 2.12 and the Ray-backed engine, and the exact cause is still under investigation. Any adjustments to the Ray channel timeout should be carefully considered to avoid introducing other issues.

Recommendation

Apply a workaround by adjusting the Ray channel timeout, as the root cause of the increased compilation latency in torch 2.12 is still being investigated. This will allow the engine to start up successfully while the underlying issue is being addressed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression] Sequence Parallel test_tp_sp_generation: RayChannelTimeoutError on tp=2 setups (Llama) [2 comments, 1 participants]