pytorch - 💡(How to fix) Fix `torch.cuda.ExternalStream(0, device=...)` silently returns a fresh pooled stream instead of wrapping handle `0x0`

pytorch2026-05-08 15:47:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

No exception is raised — the failure is silent. Output:

Root Cause

The C++ constructor THCPStream_pynew in torch/csrc/cuda/Stream.cpp selects which CUDAStream to build via a three-way ternary:

at::cuda::CUDAStream stream = (stream_id || device_index || device_type)
    ? at::cuda::CUDAStream::unpack3(stream_id, device_index, device_type)
    : stream_ptr ? at::cuda::getStreamFromExternal(
                       reinterpret_cast<cudaStream_t>(stream_ptr), current_device)
                 : at::cuda::getStreamFromPool(priority);

Whether you hit the bug depends on which kwargs the Python wrapper forwards.

Path A: `ExternalStream(0, device=0)` — broken

The Python wrapper at torch/cuda/streams.py forwards only stream_ptr:

class ExternalStream(Stream):
    def __new__(cls, stream_ptr, device=None, **kwargs):
        with torch.cuda.device(device):
            return super().__new__(cls, stream_ptr=stream_ptr, **kwargs)

The other kwargs reach the C++ layer at their declared defaults of 0:

stream_id=0, device_index=0, device_type=0, stream_ptr=0

(PyArg_ParseTupleAndKeywords with "|iLLLK" doesn't track whether a kwarg was supplied — passing stream_ptr=0 explicitly is indistinguishable from omitting it.) The outer ternary is (0 || 0 || 0) == false, so we fall through to the inner ternary, where if (stream_ptr) is also 0, and we land in getStreamFromPool — a fresh pooled stream with an unrelated handle.

Fix Action

Fix / Workaround

torch.cuda.ExternalStream(stream_ptr, device=...) is documented as wrapping "an externally allocated CUDA stream". That holds for every handle except the legacy default-stream sentinel 0x0 (NULL stream): when stream_ptr == 0 the call silently returns a freshly allocated, pooled CUDA stream whose cuda_stream attribute is not 0, instead of wrapping the NULL stream. The companion API torch.cuda.get_stream_from_external(0, device=...) (introduced in 2.7) wraps 0x0 correctly, so the workaround exists and the underlying C bindings are fine — only the ExternalStream Python constructor is affected. CuPy's analogous constructor is also faithful: cp.cuda.ExternalStream(0).ptr == 0.

Workaround (torch >= 2.7).

fixed = torch.cuda.get_stream_from_external(0x0, device=dev) print(f"get_stream_from_external(0).cuda_stream = {fixed.cuda_stream:#x}")

Code Example

import torch

dev = 0
print(f"torch version: {torch.__version__}")
print(f"current_stream(device={dev}).cuda_stream    = {torch.cuda.current_stream(device=dev).cuda_stream:#x}")

# Bug: ExternalStream(0) does not wrap handle 0.
es0 = torch.cuda.ExternalStream(0x0, device=dev)
print(f"ExternalStream(0).cuda_stream                = {es0.cuda_stream:#x}")
print(f"ExternalStream(0) wraps handle 0?           : {es0.cuda_stream == 0}")

# Re-construct: a different fresh stream each time.
es0_again = torch.cuda.ExternalStream(0x0, device=dev)
print(f"second ExternalStream(0).cuda_stream         = {es0_again.cuda_stream:#x}")

# Counter-cases that work correctly:
es_pt = torch.cuda.ExternalStream(0x2, device=dev)
print(f"ExternalStream(0x2).cuda_stream              = {es_pt.cuda_stream:#x}")  # 0x2

real = torch.cuda.Stream(device=dev)
es_real = torch.cuda.ExternalStream(real.cuda_stream, device=dev)
print(f"ExternalStream(real_stream).cuda_stream      = {es_real.cuda_stream:#x}  (matches real: {es_real.cuda_stream == real.cuda_stream})")

# Workaround (torch >= 2.7).
fixed = torch.cuda.get_stream_from_external(0x0, device=dev)
print(f"get_stream_from_external(0).cuda_stream      = {fixed.cuda_stream:#x}")

---

torch version: 2.10.0+cu130
current_stream(device=0).cuda_stream    = 0x0
ExternalStream(0).cuda_stream                = 0x43f49520
ExternalStream(0) wraps handle 0?           : False
second ExternalStream(0).cuda_stream         = 0x43f53060
ExternalStream(0x2).cuda_stream              = 0x2
ExternalStream(real_stream).cuda_stream      = 0x43f53220  (matches real: True)
get_stream_from_external(0).cuda_stream      = 0x0

---

at::cuda::CUDAStream stream = (stream_id || device_index || device_type)
    ? at::cuda::CUDAStream::unpack3(stream_id, device_index, device_type)
    : stream_ptr ? at::cuda::getStreamFromExternal(
                       reinterpret_cast<cudaStream_t>(stream_ptr), current_device)
                 : at::cuda::getStreamFromPool(priority);

---

class ExternalStream(Stream):
    def __new__(cls, stream_ptr, device=None, **kwargs):
        with torch.cuda.device(device):
            return super().__new__(cls, stream_ptr=stream_ptr, **kwargs)

---

stream_id=0, device_index=0, device_type=0, stream_ptr=0

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

reproducer

import torch

dev = 0
print(f"torch version: {torch.__version__}")
print(f"current_stream(device={dev}).cuda_stream    = {torch.cuda.current_stream(device=dev).cuda_stream:#x}")

# Bug: ExternalStream(0) does not wrap handle 0.
es0 = torch.cuda.ExternalStream(0x0, device=dev)
print(f"ExternalStream(0).cuda_stream                = {es0.cuda_stream:#x}")
print(f"ExternalStream(0) wraps handle 0?           : {es0.cuda_stream == 0}")

# Re-construct: a different fresh stream each time.
es0_again = torch.cuda.ExternalStream(0x0, device=dev)
print(f"second ExternalStream(0).cuda_stream         = {es0_again.cuda_stream:#x}")

# Counter-cases that work correctly:
es_pt = torch.cuda.ExternalStream(0x2, device=dev)
print(f"ExternalStream(0x2).cuda_stream              = {es_pt.cuda_stream:#x}")  # 0x2

real = torch.cuda.Stream(device=dev)
es_real = torch.cuda.ExternalStream(real.cuda_stream, device=dev)
print(f"ExternalStream(real_stream).cuda_stream      = {es_real.cuda_stream:#x}  (matches real: {es_real.cuda_stream == real.cuda_stream})")

# Workaround (torch >= 2.7).
fixed = torch.cuda.get_stream_from_external(0x0, device=dev)
print(f"get_stream_from_external(0).cuda_stream      = {fixed.cuda_stream:#x}")

Observed output (on `torch == 2.10.0+cu130`)

No exception is raised — the failure is silent. Output:

torch version: 2.10.0+cu130
current_stream(device=0).cuda_stream    = 0x0
ExternalStream(0).cuda_stream                = 0x43f49520
ExternalStream(0) wraps handle 0?           : False
second ExternalStream(0).cuda_stream         = 0x43f53060
ExternalStream(0x2).cuda_stream              = 0x2
ExternalStream(real_stream).cuda_stream      = 0x43f53220  (matches real: True)
get_stream_from_external(0).cuda_stream      = 0x0

The two interesting lines:

ExternalStream(0).cuda_stream returns 0x43f49520, not 0 — and successive calls return yet different fresh handles (0x43f53060, ...).
get_stream_from_external(0).cuda_stream returns 0x0 correctly, so the underlying C path is fine; only ExternalStream's Python wrapper is wrong.

Expected output

ExternalStream(0).cuda_stream should be 0x0

Root cause

The C++ constructor THCPStream_pynew in torch/csrc/cuda/Stream.cpp selects which CUDAStream to build via a three-way ternary:

at::cuda::CUDAStream stream = (stream_id || device_index || device_type)
    ? at::cuda::CUDAStream::unpack3(stream_id, device_index, device_type)
    : stream_ptr ? at::cuda::getStreamFromExternal(
                       reinterpret_cast<cudaStream_t>(stream_ptr), current_device)
                 : at::cuda::getStreamFromPool(priority);

Whether you hit the bug depends on which kwargs the Python wrapper forwards.

Path A: `ExternalStream(0, device=0)` — broken

The Python wrapper at torch/cuda/streams.py forwards only stream_ptr:

class ExternalStream(Stream):
    def __new__(cls, stream_ptr, device=None, **kwargs):
        with torch.cuda.device(device):
            return super().__new__(cls, stream_ptr=stream_ptr, **kwargs)

The other kwargs reach the C++ layer at their declared defaults of 0:

stream_id=0, device_index=0, device_type=0, stream_ptr=0

Versions

I confirm the code path is identical at every released version I checked, through current main:

Version	`torch/cuda/streams.py::ExternalStream.__new__`	`torch/csrc/cuda/Stream.cpp::THCPStream_pynew` ternary
`v2.10.0`	only forwards `stream_ptr` — streams.py	falls to `getStreamFromPool` when all kwargs are `0` — Stream.cpp
`v2.11.0`	identical — streams.py	identical — Stream.cpp
`v2.12.0-rc9`	identical — streams.py	identical — Stream.cpp
`main`	identical — streams.py	identical — Stream.cpp

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix `torch.cuda.ExternalStream(0, device=...)` silently returns a fresh pooled stream instead of wrapping handle `0x0`

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Path A: `ExternalStream(0, device=0)` — broken

Fix Action

Fix / Workaround

Workaround (torch >= 2.7).

Code Example

🐛 Describe the bug

reproducer

Observed output (on `torch == 2.10.0+cu130`)

Expected output

Root cause

Path A: `ExternalStream(0, device=0)` — broken

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix `torch.cuda.ExternalStream(0, device=...)` silently returns a fresh pooled stream instead of wrapping handle `0x0`

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Path A: ExternalStream(0, device=0) — broken

Fix Action

Fix / Workaround

Workaround (torch >= 2.7).

Code Example

🐛 Describe the bug

reproducer

Observed output (on torch == 2.10.0+cu130)

Expected output

Root cause

Path A: ExternalStream(0, device=0) — broken

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Path A: `ExternalStream(0, device=0)` — broken

Observed output (on `torch == 2.10.0+cu130`)

Path A: `ExternalStream(0, device=0)` — broken