vllm - ✅(Solved) Fix [CI Failure]: LoRA TP (Distributed) _get_lora_aux_cuda_stream is not defined [1 pull requests, 1 participants]

vllm2026-04-14 15:06:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39804•Fetched 2026-04-16 06:36:34

View on GitHub

Comments

Participants

Timeline

Reactions

Author

elvircrn

Participants

elvircrn

Timeline (top)

cross-referenced ×2added_to_project_v2 ×1labeled ×1

Error Message

Loading safetensors checkpoint shards: 100% 3/3 [00:16<00:00, 5.49s/it]

[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [default_loader.py:384] Loading weights took 16.75 seconds [2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [unquantized.py:340] Using MoEPrepareAndFinalizeNoDPEPModular [2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [utils.py:99] MoE model detected. Using fused MoE LoRA implementation. [2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [punica_selector.py:20] Using PunicaWrapperGPU. [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] EngineCore failed to start. [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] Traceback (most recent call last): [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1084, in run_engine_core [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 850, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] super().init( [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 116, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] self.model_executor = executor_class(vllm_config) [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] self._init_executor() [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] self.driver_worker.load_model() [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] self.model_runner.load_model(load_dummy_weights=load_dummy_weights) [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4770, in load_model [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] self.model = self.load_lora_model( [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] ^^^^^^^^^^^^^^^^^^^^^ [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] return self.lora_manager.create_lora_manager(model, vllm_config) [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/worker_manager.py", line 265, in create_lora_manager [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] lora_manager = create_lora_manager( [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] ^^^^^^^^^^^^^^^^^^^^ [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 944, in create_lora_manager [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] lora_manager = lora_manager_cls( [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] ^^^^^^^^^^^^^^^^^ [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 856, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] super().init( [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 115, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] self._create_lora_modules() [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 406, in _create_lora_modules [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] from_layer( [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/utils.py", line 119, in from_layer [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] instance_layer = lora_cls(layer) [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] ^^^^^^^^^^^^^^^ [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 368, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] super().init(base_layer) [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 186, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] super().init(base_layer) [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 85, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] super().init(base_layer) [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/base_linear.py", line 87, in init [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] self._init_lora_stream_context() [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/base_linear.py", line 96, in _init_lora_stream_context [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] self._lora_stream = _get_lora_aux_cuda_stream() [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] ^^^^^^^^^^^^^^^^^^^^^^^^^ [2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] NameError: name '_get_lora_aux_cuda_stream' is not defined [2026-04-14T06:24:28Z] (EngineCore pid=5105) Process EngineCore:

Root Cause

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

Fix Action

Fixed

Fixed by PR: fix(lora): define _get_lora_aux_cuda_stream unconditionally to prevent NameError (https://github.com/vllm-project/vllm/pull/39834)

PR fix notes

PR #39834: fix(lora): define _get_lora_aux_cuda_stream unconditionally to prevent NameError

Repository: vllm-project/vllm
Author: ssam18
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39834

Description (problem / solution / changelog)

_get_lora_aux_cuda_stream was defined inside if envs.VLLM_LORA_ENABLE_DUAL_STREAM: at module import time, but self._enable_aux_cuda_stream re-reads the same env var at object creation time. Since the EngineCore subprocess is launched via fork by default, the env var value at fork time can differ from its value when base_linear.py was first imported, causing a NameError when _init_lora_stream_context tries to call the undefined function. Moving _get_lora_aux_cuda_stream outside the conditional block ensures it is always available; the lora_linear_async custom op registration stays conditional since it is a one-time side effect that should only happen when dual stream is enabled. Fixes #39804

Changed files

vllm/lora/layers/base_linear.py (modified, +42/-39)

Code Example

pytest -v -s -x lora/test_olmoe_tp.py

---

Loading safetensors checkpoint shards: 100% 3/3 [00:16<00:00,  5.49s/it]
--
[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [default_loader.py:384] Loading weights took 16.75 seconds
[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [unquantized.py:340] Using MoEPrepareAndFinalizeNoDPEPModular
[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [utils.py:99] MoE model detected. Using fused MoE LoRA implementation.
[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [punica_selector.py:20] Using PunicaWrapperGPU.
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] EngineCore failed to start.
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] Traceback (most recent call last):
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1084, in run_engine_core
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 850, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 116, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self.model_executor = executor_class(vllm_config)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self._init_executor()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self.driver_worker.load_model()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4770, in load_model
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self.model = self.load_lora_model(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                  ^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     return self.lora_manager.create_lora_manager(model, vllm_config)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/worker_manager.py", line 265, in create_lora_manager
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     lora_manager = create_lora_manager(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                    ^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 944, in create_lora_manager
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     lora_manager = lora_manager_cls(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                    ^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 856, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 115, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self._create_lora_modules()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 406, in _create_lora_modules
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     from_layer(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/utils.py", line 119, in from_layer
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     instance_layer = lora_cls(layer)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                      ^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 368, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(base_layer)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 186, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(base_layer)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 85, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(base_layer)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/base_linear.py", line 87, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self._init_lora_stream_context()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/base_linear.py", line 96, in _init_lora_stream_context
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self._lora_stream = _get_lora_aux_cuda_stream()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                         ^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] NameError: name '_get_lora_aux_cuda_stream' is not defined
[2026-04-14T06:24:28Z] (EngineCore pid=5105) Process EngineCore:
 

<br class="Apple-interchange-newline">

RAW_BUFFERClick to expand / collapse

Name of failing test

lora/test_olmoe_tp.py

Basic information

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

LoRA TP (Distributed) test failure

pytest -v -s -x lora/test_olmoe_tp.py

due to


Loading safetensors checkpoint shards: 100% 3/3 [00:16<00:00,  5.49s/it]
--
[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [default_loader.py:384] Loading weights took 16.75 seconds
[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [unquantized.py:340] Using MoEPrepareAndFinalizeNoDPEPModular
[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [utils.py:99] MoE model detected. Using fused MoE LoRA implementation.
[2026-04-14T06:24:27Z] (EngineCore pid=5105) INFO 04-14 06:24:27 [punica_selector.py:20] Using PunicaWrapperGPU.
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] EngineCore failed to start.
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] Traceback (most recent call last):
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1084, in run_engine_core
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 850, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 116, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self.model_executor = executor_class(vllm_config)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self._init_executor()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self.driver_worker.load_model()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4770, in load_model
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self.model = self.load_lora_model(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                  ^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     return self.lora_manager.create_lora_manager(model, vllm_config)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/worker_manager.py", line 265, in create_lora_manager
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     lora_manager = create_lora_manager(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                    ^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 944, in create_lora_manager
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     lora_manager = lora_manager_cls(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                    ^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 856, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 115, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self._create_lora_modules()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/model_manager.py", line 406, in _create_lora_modules
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     from_layer(
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/utils.py", line 119, in from_layer
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     instance_layer = lora_cls(layer)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                      ^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 368, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(base_layer)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 186, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(base_layer)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 85, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     super().__init__(base_layer)
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/base_linear.py", line 87, in __init__
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self._init_lora_stream_context()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/base_linear.py", line 96, in _init_lora_stream_context
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]     self._lora_stream = _get_lora_aux_cuda_stream()
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110]                         ^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-04-14T06:24:28Z] (EngineCore pid=5105) ERROR 04-14 06:24:28 [core.py:1110] NameError: name '_get_lora_aux_cuda_stream' is not defined
[2026-04-14T06:24:28Z] (EngineCore pid=5105) Process EngineCore:
 

<br class="Apple-interchange-newline">

📝 History of failing test

Not failed in previous nightly.

CC List.

No response

extent analysis

TL;DR

The most likely fix is to define the _get_lora_aux_cuda_stream function, which is currently not defined and causing a NameError.

Guidance

Check the base_linear.py file for any missing imports or function definitions related to _get_lora_aux_cuda_stream.
Verify that the _get_lora_aux_cuda_stream function is defined in the correct scope and is accessible from the base_linear.py file.
If the function is defined in another file, ensure that it is properly imported and accessible.
Review the code changes made recently to identify any potential causes for the _get_lora_aux_cuda_stream function to become undefined.

Example

No code example is provided as the issue is related to a missing function definition, and without more context, it's difficult to provide a specific solution.

Notes

The error message indicates that the _get_lora_aux_cuda_stream function is not defined, which suggests that there might be a missing import or a function definition issue. It's essential to review the code and identify the cause of the missing function definition.

Recommendation

Apply a workaround by defining the _get_lora_aux_cuda_stream function or fixing the import issue, as the error message clearly indicates that this function is not defined.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#callback error #memory management #API rate limit #retriever error #indexing error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [CI Failure]: LoRA TP (Distributed) _get_lora_aux_cuda_stream is not defined [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Loading safetensors checkpoint shards: 100% 3/3 [00:16<00:00, 5.49s/it]

Root Cause

Fix Action

Fixed

PR fix notes

PR #39834: fix(lora): define _get_lora_aux_cuda_stream unconditionally to prevent NameError

Description (problem / solution / changelog)

Changed files

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [CI Failure]: LoRA TP (Distributed) _get_lora_aux_cuda_stream is not defined [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Loading safetensors checkpoint shards: 100% 3/3 [00:16<00:00, 5.49s/it]

Root Cause

Fix Action

Fixed

PR fix notes

PR #39834: fix(lora): define _get_lora_aux_cuda_stream unconditionally to prevent NameError

Description (problem / solution / changelog)

Changed files

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING