vllm - 💡(How to fix) Fix [Bug]: QWEN3.5-27B fails to mount LoRA [1 participants]

vllm2026-04-25 08:38:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40869•Fetched 2026-04-26 05:06:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

xiaoshutong27

Participants

xiaoshutong27

Timeline (top)

labeled ×1

Error Message

Root Cause

(Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] WorkerProc hit an exception. (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] Traceback (most recent call last): (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 927, in worker_busy_loop (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] output = func(*args, **kwargs) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/usr/local/python3.11.14/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] return func(*args, **kwargs) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 343, in determine_available_memory (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self.model_runner.profile_run() (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2550, in profile_run (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] super().profile_run() (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5516, in profile_run (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] hidden_states, last_hidden_states = self._dummy_run( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/usr/local/python3.11.14/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] return func(*args, **kwargs) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2416, in _dummy_run (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] with self.maybe_dummy_run_with_lora( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/usr/local/python3.11.14/lib/python3.11/contextlib.py", line 137, in enter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] return next(self.gen) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 254, in maybe_dummy_run_with_lora (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] with ( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/usr/local/python3.11.14/lib/python3.11/contextlib.py", line 137, in enter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] return next(self.gen) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 224, in maybe_select_dummy_loras (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self._set_active_loras( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 67, in _set_active_loras (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self.lora_manager.set_active_adapters(lora_requests, lora_mapping) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/worker_manager.py", line 176, in set_active_adapters (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self._apply_adapters(requests) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/worker_manager.py", line 263, in _apply_adapters (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self.add_adapter(lora) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/worker_manager.py", line 298, in add_adapter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self._adapter_manager.activate_adapter(lora_request.lora_int_id) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/model_manager.py", line 851, in activate_adapter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] result = super().activate_adapter(lora_id) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/model_manager.py", line 261, in activate_adapter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] module.set_lora( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/layers/column_parallel_linear.py", line 265, in set_lora (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] lora_b = self.slice_lora_b(lora_b) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/layers/column_parallel_linear.py", line 249, in slice_lora_b (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] if (lora_b_i := lora_b[i]) is not None: (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ~~~~~~^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] IndexError: list index out of range (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] Traceback (most recent call last): (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 927, in worker_busy_loop (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] output = func(*args, **kwargs) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/usr/local/python3.11.14/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] return func(*args, **kwargs) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 343, in determine_available_memory (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self.model_runner.profile_run() (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2550, in profile_run (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] super().profile_run() (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5516, in profile_run (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] hidden_states, last_hidden_states = self._dummy_run( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/usr/local/python3.11.14/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] return func(*args, **kwargs) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2416, in _dummy_run (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] with self.maybe_dummy_run_with_lora( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/usr/local/python3.11.14/lib/python3.11/contextlib.py", line 137, in enter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] return next(self.gen) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 254, in maybe_dummy_run_with_lora (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] with ( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/usr/local/python3.11.14/lib/python3.11/contextlib.py", line 137, in enter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] return next(self.gen) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 224, in maybe_select_dummy_loras (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self._set_active_loras( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 67, in _set_active_loras (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self.lora_manager.set_active_adapters(lora_requests, lora_mapping) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/worker_manager.py", line 176, in set_active_adapters (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self._apply_adapters(requests) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/worker_manager.py", line 263, in _apply_adapters (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self.add_adapter(lora) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/worker_manager.py", line 298, in add_adapter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] self._adapter_manager.activate_adapter(lora_request.lora_int_id) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/model_manager.py", line 851, in activate_adapter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] result = super().activate_adapter(lora_id) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/model_manager.py", line 261, in activate_adapter (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] module.set_lora( (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/layers/column_parallel_linear.py", line 265, in set_lora (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] lora_b = self.slice_lora_b(lora_b) (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] File "/vllm-workspace/vllm/vllm/lora/layers/column_parallel_linear.py", line 249, in slice_lora_b (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] if (lora_b_i := lora_b[i]) is not None: (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] ~~~~~~^^^ (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] IndexError: list index out of range (Worker_TP0 pid=1256) ERROR 04-22 02:12:02 [multiproc_executor.py:932] (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] EngineCore failed to start. (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] Traceback (most recent call last): (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] super().init( (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 122, in init (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] kv_cache_config = self._initialize_kv_caches(vllm_config) (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 245, in _initialize_kv_caches (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] return self.collective_rpc("determine_available_memory") (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 397, in collective_rpc (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] return aggregate(get_response()) (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] ^^^^^^^^^^^^^^ (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 380, in get_response (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] raise RuntimeError( (EngineCore pid=1243) ERROR 04-22 02:12:02 [core.py:1099] RuntimeError: Worker failed with error 'list index out of range', please check the stack trace above for the root cause (Worker_TP0 pid=1256) WARNING 04-22 02:12:02 [multiproc_executor.py:866] WorkerProc was terminated (Worker_TP2 pid=1258) WARNING 04-22 02:12:02 [multiproc_executor.py:866] WorkerProc was terminated (Worker_TP1 pid=1257) WARNING 04-22 02:12:02 [multiproc_executor.py:866] WorkerProc was terminated (Worker_TP3 pid=1259) WARNING 04-22 02:12:02 [multiproc_executor.py:866] WorkerProc was terminated (EngineCore pid=1243) ERROR 04-22 02:12:12 [multiproc_executor.py:273] Worker proc VllmWorker-1 died unexpectedly, shutting down executor. (EngineCore pid=1243) Process EngineCore: (EngineCore pid=1243) Traceback (most recent call last): (EngineCore pid=1243) File "/usr/local/python3.11.14/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=1243) self.run() (EngineCore pid=1243) File "/usr/local/python3.11.14/lib/python3.11/multiprocessing/process.py", line 108, in run (EngineCore pid=1243) self._target(*self._args, **self._kwargs) (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 1103, in run_engine_core (EngineCore pid=1243) raise e (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=1243) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=1243) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1243) return func(*args, **kwargs) (EngineCore pid=1243) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=1243) super().init( (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 122, in init (EngineCore pid=1243) kv_cache_config = self._initialize_kv_caches(vllm_config) (EngineCore pid=1243) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1243) return func(*args, **kwargs) (EngineCore pid=1243) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 245, in _initialize_kv_caches (EngineCore pid=1243) available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore pid=1243) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore pid=1243) return self.collective_rpc("determine_available_memory") (EngineCore pid=1243) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 397, in collective_rpc (EngineCore pid=1243) return aggregate(get_response()) (EngineCore pid=1243) ^^^^^^^^^^^^^^ (EngineCore pid=1243) File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 380, in get_response (EngineCore pid=1243) raise RuntimeError( (EngineCore pid=1243) RuntimeError: Worker failed with error 'list index out of range', please check the stack trace above for the root cause (APIServer pid=1224) Traceback (most recent call last): (APIServer pid=1224) File "/usr/local/python3.11.14/bin/vllm", line 6, in <module> (APIServer pid=1224) sys.exit(main()) (APIServer pid=1224) ^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=1224) args.dispatch_function(args) (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 118, in cmd (APIServer pid=1224) uvloop.run(run_server(args)) (APIServer pid=1224) File "/usr/local/python3.11.14/lib/python3.11/site-packages/uvloop/init.py", line 92, in run (APIServer pid=1224) return runner.run(wrapper()) (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "/usr/local/python3.11.14/lib/python3.11/asyncio/runners.py", line 118, in run (APIServer pid=1224) return self._loop.run_until_complete(task) (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1224) File "/usr/local/python3.11.14/lib/python3.11/site-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=1224) return await main (APIServer pid=1224) ^^^^^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 656, in run_server (APIServer pid=1224) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker (APIServer pid=1224) async with build_async_engine_client( (APIServer pid=1224) File "/usr/local/python3.11.14/lib/python3.11/contextlib.py", line 210, in aenter (APIServer pid=1224) return await anext(self.gen) (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client (APIServer pid=1224) async with build_async_engine_client_from_engine_args( (APIServer pid=1224) File "/usr/local/python3.11.14/lib/python3.11/contextlib.py", line 210, in aenter (APIServer pid=1224) return await anext(self.gen) (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args (APIServer pid=1224) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=1224) return cls( (APIServer pid=1224) ^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=1224) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1224) return func(*args, **kwargs) (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client (APIServer pid=1224) return AsyncMPClient(*client_args) (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1224) return func(*args, **kwargs) (APIServer pid=1224) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 924, in init (APIServer pid=1224) super().init( (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 583, in init (APIServer pid=1224) with launch_core_engines( (APIServer pid=1224) File "/usr/local/python3.11.14/lib/python3.11/contextlib.py", line 144, in exit (APIServer pid=1224) next(self.gen) (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 972, in launch_core_engines (APIServer pid=1224) wait_for_engine_startup( (APIServer pid=1224) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup (APIServer pid=1224) raise RuntimeError( (APIServer pid=1224) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} (APIServer pid=1224) [ERROR] 2026-04-22-02:12:15 (PID:1224, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception (APIServer pid=1224) sys:1: DeprecationWarning: builtin type swigvarlink has no module attribute root@hqcnhuawei01l:/home/AI_Model# /usr/local/python3.11.14/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Fix Action

Fix / Workaround

RAW_BUFFERClick to expand / collapse

Your current environment

I used the Atlas 800I A2 (910B4 32G 4-card) to run Qwen3.5-27B and load Lora. The vLLM-ascend version is v0.18.0rc1.

🐛 Describe the bug

The content of vllm_serve_qwen35.sh is as follows:

#!/bin/sh

Load model from ModelScope to speed up download

export VLLM_USE_MODELSCOPE=true

To reduce memory fragmentation and avoid out of memory

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export HCCL_BUFFSIZE=512 export OMP_PROC_BIND=false export OMP_NUM_THREADS=1 export TASK_QUEUE_ENABLE=1

vllm serve Eco-Tech/Qwen3.5-27B-w8a8-mtp
--host 0.0.0.0
--port 8000
--data-parallel-size 1
--tensor-parallel-size 2
--seed 1024
--quantization ascend
--served-model-name qwen3.5
--max-num-seqs 32
--max-model-len 133000
--max-num-batched-tokens 8096
--trust-remote-code
--gpu-memory-utilization 0.90
--no-enable-prefix-caching
--speculative_config '{"method": "qwen3_5_mtp", "num_speculative_tokens": 3, "enforce_eager": true}'
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'
--additional-config '{"enable_cpu_binding":true}'
--async-scheduling
--enable-lora --lora-modules mix-lora=checkpoint

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The error is likely caused by an "IndexError: list index out of range" exception in the slice_lora_b function, which may be due to incorrect Lora configuration or model loading.

Guidance

Check the Lora configuration and model loading process to ensure that the lora_b list is properly initialized and indexed.
Verify that the mix-lora=checkpoint configuration is correct and compatible with the model being used.
Review the model architecture and Lora implementation to ensure that the slice_lora_b function is correctly handling the Lora weights.
Consider checking the model's documentation or seeking help from the model's developers to ensure proper usage of Lora.

Notes

The error message indicates that the issue is related to the Lora configuration or model loading, but the exact cause is unclear without further information about the model and its implementation.

Recommendation

Apply a workaround by re-checking the Lora configuration and model loading process to ensure correctness, and consider seeking help from the model's developers if necessary.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: QWEN3.5-27B fails to mount LoRA [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Your current environment

🐛 Describe the bug

Load model from ModelScope to speed up download

To reduce memory fragmentation and avoid out of memory

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: QWEN3.5-27B fails to mount LoRA [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Your current environment

🐛 Describe the bug

Load model from ModelScope to speed up download

To reduce memory fragmentation and avoid out of memory

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING