vllm - 💡(How to fix) Fix [Bug]: Multi-LoRA Initialization Failure on Cloud TPU v6e-4 (AssertionError: LoRA is not enabled) [1 comments, 1 participants]

vllm2026-04-10 09:08:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39490•Fetched 2026-04-11 06:13:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

engineerbharath12

Participants

engineerbharath12

Timeline (top)

closed ×1commented ×1

When attempting to serve a model with Multi-LoRA enabled on Cloud TPU v6e-4 using the latest vllm/vllm-tpu:latest image, the EngineCore fails to initialize. The API server correctly parses the LoRA modules, but the underlying TPU JAX worker (EngineCore) crashes with an AssertionError: LoRA is not enabled during the graph capture phase.

Error Message

(EngineCore_DP0 pid=305) ERROR 04-02 07:26:44 [core.py:866] AssertionError: LoRA is not enabled (EngineCore_DP0 pid=305) Traceback (most recent call last): (EngineCore_DP0 pid=305) File "/workspace/tpu_inference/tpu_inference/worker/tpu_worker.py", line 369, in compile_or_warm_up_model (EngineCore_DP0 pid=305) self.model_runner.capture_model() (EngineCore_DP0 pid=305) File "/workspace/tpu_inference/tpu_inference/runner/tpu_runner.py", line 556, in capture_model (EngineCore_DP0 pid=305) self.compilation_manager.capture_model() (EngineCore_DP0 pid=305) File "/workspace/tpu_inference/tpu_inference/runner/compilation_manager.py", line 92, in capture_model (EngineCore_DP0 pid=305) with self.runner.maybe_setup_dummy_loras(self.runner.lora_config): (EngineCore_DP0 pid=305) File "/workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 97, in maybe_setup_dummy_loras (EngineCore_DP0 pid=305) assert self.lora_manager is not None, "LoRA is not enabled" (EngineCore_DP0 pid=305) AssertionError: LoRA is not enabled

Root Cause

Code Example

vllm serve Qwen/Qwen3-4B-Instruct-2507 \
    --tensor-parallel-size 4 \
    --dtype bfloat16 \
    --enable-lora \
    --max-loras 3 \
    --max-lora-rank 64 \
    --lora-modules \
        adapter1=voidful/llm-codec-fisher-no-init \
        adapter2=voidful/auv-codec-librispeech \
        adapter3=voidful/unicodec-fisher

---

(EngineCore_DP0 pid=305) ERROR 04-02 07:26:44 [core.py:866] AssertionError: LoRA is not enabled
(EngineCore_DP0 pid=305) Traceback (most recent call last):
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/worker/tpu_worker.py", line 369, in compile_or_warm_up_model
(EngineCore_DP0 pid=305)     self.model_runner.capture_model()
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/runner/tpu_runner.py", line 556, in capture_model
(EngineCore_DP0 pid=305)     self.compilation_manager.capture_model()
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/runner/compilation_manager.py", line 92, in capture_model
(EngineCore_DP0 pid=305)     with self.runner.maybe_setup_dummy_loras(self.runner.lora_config):
(EngineCore_DP0 pid=305)   File "/workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 97, in maybe_setup_dummy_loras
(EngineCore_DP0 pid=305)     assert self.lora_manager is not None, "LoRA is not enabled"
(EngineCore_DP0 pid=305) AssertionError: LoRA is not enabled

RAW_BUFFERClick to expand / collapse

Description

Environment

Hardware: Cloud TPU v6e-4 (4 chips)
vLLM Image: vllm/vllm-tpu:latest (v0.13.0)
Model: Qwen/Qwen3-4B-Instruct-2507
TP Size: 4
Engine: V1 (TPU native JAX backend)

Reproduction

vllm serve Qwen/Qwen3-4B-Instruct-2507 \
    --tensor-parallel-size 4 \
    --dtype bfloat16 \
    --enable-lora \
    --max-loras 3 \
    --max-lora-rank 64 \
    --lora-modules \
        adapter1=voidful/llm-codec-fisher-no-init \
        adapter2=voidful/auv-codec-librispeech \
        adapter3=voidful/unicodec-fisher

Error Stack

(EngineCore_DP0 pid=305) ERROR 04-02 07:26:44 [core.py:866] AssertionError: LoRA is not enabled
(EngineCore_DP0 pid=305) Traceback (most recent call last):
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/worker/tpu_worker.py", line 369, in compile_or_warm_up_model
(EngineCore_DP0 pid=305)     self.model_runner.capture_model()
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/runner/tpu_runner.py", line 556, in capture_model
(EngineCore_DP0 pid=305)     self.compilation_manager.capture_model()
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/runner/compilation_manager.py", line 92, in capture_model
(EngineCore_DP0 pid=305)     with self.runner.maybe_setup_dummy_loras(self.runner.lora_config):
(EngineCore_DP0 pid=305)   File "/workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 97, in maybe_setup_dummy_loras
(EngineCore_DP0 pid=305)     assert self.lora_manager is not None, "LoRA is not enabled"
(EngineCore_DP0 pid=305) AssertionError: LoRA is not enabled

Full Logs

Full uninterrupted initialization and crash logs can be found here: https://gist.github.com/engineerbharath12/34214b2ca1ebc595dc9564d077a902e4

extent analysis

TL;DR

The most likely fix is to verify the LoRA configuration and ensure it is correctly enabled and configured for the TPU JAX worker.

Guidance

Review the --enable-lora flag and its associated options (--max-loras, --max-lora-rank, --lora-modules) to ensure they are correctly set and compatible with the TPU JAX worker.
Check the lora_model_runner_mixin.py file to understand the maybe_setup_dummy_loras method and the lora_manager attribute, as the error occurs in this context.
Verify that the vllm/vllm-tpu:latest image (v0.13.0) supports Multi-LoRA on Cloud TPU v6e-4 and that there are no known issues or limitations.
Test the model serving with a simpler LoRA configuration to isolate the issue and determine if it's specific to the current setup.

Example

No code snippet is provided as the issue is more related to configuration and compatibility rather than a specific code error.

Notes

The provided error stack and logs suggest a configuration or compatibility issue rather than a code bug. The AssertionError: LoRA is not enabled error indicates that the LoRA configuration is not being recognized or applied correctly by the TPU JAX worker.

Recommendation

Apply a workaround by simplifying the LoRA configuration and testing it with a smaller model or a different TPU setup to isolate the issue and determine the root cause. This approach can help identify if the problem is specific to the current configuration or a more general compatibility issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt template #agent execution #callback error #memory management

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: Multi-LoRA Initialization Failure on Cloud TPU v6e-4 (AssertionError: LoRA is not enabled) [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Description

Environment

Reproduction

Error Stack

Full Logs

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Multi-LoRA Initialization Failure on Cloud TPU v6e-4 (AssertionError: LoRA is not enabled) [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Description

Environment

Reproduction

Error Stack

Full Logs

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING