vllm - 💡(How to fix) Fix [Bug]: Multi-LoRA Initialization Failure on Cloud TPU v6e-4 (AssertionError: LoRA is not enabled) [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39490Fetched 2026-04-11 06:13:20
View on GitHub
Comments
1
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
closed ×1commented ×1

When attempting to serve a model with Multi-LoRA enabled on Cloud TPU v6e-4 using the latest vllm/vllm-tpu:latest image, the EngineCore fails to initialize. The API server correctly parses the LoRA modules, but the underlying TPU JAX worker (EngineCore) crashes with an AssertionError: LoRA is not enabled during the graph capture phase.

Error Message

(EngineCore_DP0 pid=305) ERROR 04-02 07:26:44 [core.py:866] AssertionError: LoRA is not enabled (EngineCore_DP0 pid=305) Traceback (most recent call last): (EngineCore_DP0 pid=305) File "/workspace/tpu_inference/tpu_inference/worker/tpu_worker.py", line 369, in compile_or_warm_up_model (EngineCore_DP0 pid=305) self.model_runner.capture_model() (EngineCore_DP0 pid=305) File "/workspace/tpu_inference/tpu_inference/runner/tpu_runner.py", line 556, in capture_model (EngineCore_DP0 pid=305) self.compilation_manager.capture_model() (EngineCore_DP0 pid=305) File "/workspace/tpu_inference/tpu_inference/runner/compilation_manager.py", line 92, in capture_model (EngineCore_DP0 pid=305) with self.runner.maybe_setup_dummy_loras(self.runner.lora_config): (EngineCore_DP0 pid=305) File "/workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 97, in maybe_setup_dummy_loras (EngineCore_DP0 pid=305) assert self.lora_manager is not None, "LoRA is not enabled" (EngineCore_DP0 pid=305) AssertionError: LoRA is not enabled

Root Cause

When attempting to serve a model with Multi-LoRA enabled on Cloud TPU v6e-4 using the latest vllm/vllm-tpu:latest image, the EngineCore fails to initialize. The API server correctly parses the LoRA modules, but the underlying TPU JAX worker (EngineCore) crashes with an AssertionError: LoRA is not enabled during the graph capture phase.

Code Example

vllm serve Qwen/Qwen3-4B-Instruct-2507 \
    --tensor-parallel-size 4 \
    --dtype bfloat16 \
    --enable-lora \
    --max-loras 3 \
    --max-lora-rank 64 \
    --lora-modules \
        adapter1=voidful/llm-codec-fisher-no-init \
        adapter2=voidful/auv-codec-librispeech \
        adapter3=voidful/unicodec-fisher

---

(EngineCore_DP0 pid=305) ERROR 04-02 07:26:44 [core.py:866] AssertionError: LoRA is not enabled
(EngineCore_DP0 pid=305) Traceback (most recent call last):
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/worker/tpu_worker.py", line 369, in compile_or_warm_up_model
(EngineCore_DP0 pid=305)     self.model_runner.capture_model()
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/runner/tpu_runner.py", line 556, in capture_model
(EngineCore_DP0 pid=305)     self.compilation_manager.capture_model()
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/runner/compilation_manager.py", line 92, in capture_model
(EngineCore_DP0 pid=305)     with self.runner.maybe_setup_dummy_loras(self.runner.lora_config):
(EngineCore_DP0 pid=305)   File "/workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 97, in maybe_setup_dummy_loras
(EngineCore_DP0 pid=305)     assert self.lora_manager is not None, "LoRA is not enabled"
(EngineCore_DP0 pid=305) AssertionError: LoRA is not enabled
RAW_BUFFERClick to expand / collapse

Description

When attempting to serve a model with Multi-LoRA enabled on Cloud TPU v6e-4 using the latest vllm/vllm-tpu:latest image, the EngineCore fails to initialize. The API server correctly parses the LoRA modules, but the underlying TPU JAX worker (EngineCore) crashes with an AssertionError: LoRA is not enabled during the graph capture phase.

Environment

  • Hardware: Cloud TPU v6e-4 (4 chips)
  • vLLM Image: vllm/vllm-tpu:latest (v0.13.0)
  • Model: Qwen/Qwen3-4B-Instruct-2507
  • TP Size: 4
  • Engine: V1 (TPU native JAX backend)

Reproduction

vllm serve Qwen/Qwen3-4B-Instruct-2507 \
    --tensor-parallel-size 4 \
    --dtype bfloat16 \
    --enable-lora \
    --max-loras 3 \
    --max-lora-rank 64 \
    --lora-modules \
        adapter1=voidful/llm-codec-fisher-no-init \
        adapter2=voidful/auv-codec-librispeech \
        adapter3=voidful/unicodec-fisher

Error Stack

(EngineCore_DP0 pid=305) ERROR 04-02 07:26:44 [core.py:866] AssertionError: LoRA is not enabled
(EngineCore_DP0 pid=305) Traceback (most recent call last):
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/worker/tpu_worker.py", line 369, in compile_or_warm_up_model
(EngineCore_DP0 pid=305)     self.model_runner.capture_model()
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/runner/tpu_runner.py", line 556, in capture_model
(EngineCore_DP0 pid=305)     self.compilation_manager.capture_model()
(EngineCore_DP0 pid=305)   File "/workspace/tpu_inference/tpu_inference/runner/compilation_manager.py", line 92, in capture_model
(EngineCore_DP0 pid=305)     with self.runner.maybe_setup_dummy_loras(self.runner.lora_config):
(EngineCore_DP0 pid=305)   File "/workspace/vllm/vllm/v1/worker/lora_model_runner_mixin.py", line 97, in maybe_setup_dummy_loras
(EngineCore_DP0 pid=305)     assert self.lora_manager is not None, "LoRA is not enabled"
(EngineCore_DP0 pid=305) AssertionError: LoRA is not enabled

Full Logs

Full uninterrupted initialization and crash logs can be found here: https://gist.github.com/engineerbharath12/34214b2ca1ebc595dc9564d077a902e4

extent analysis

TL;DR

The most likely fix is to verify the LoRA configuration and ensure it is correctly enabled and configured for the TPU JAX worker.

Guidance

  • Review the --enable-lora flag and its associated options (--max-loras, --max-lora-rank, --lora-modules) to ensure they are correctly set and compatible with the TPU JAX worker.
  • Check the lora_model_runner_mixin.py file to understand the maybe_setup_dummy_loras method and the lora_manager attribute, as the error occurs in this context.
  • Verify that the vllm/vllm-tpu:latest image (v0.13.0) supports Multi-LoRA on Cloud TPU v6e-4 and that there are no known issues or limitations.
  • Test the model serving with a simpler LoRA configuration to isolate the issue and determine if it's specific to the current setup.

Example

No code snippet is provided as the issue is more related to configuration and compatibility rather than a specific code error.

Notes

The provided error stack and logs suggest a configuration or compatibility issue rather than a code bug. The AssertionError: LoRA is not enabled error indicates that the LoRA configuration is not being recognized or applied correctly by the TPU JAX worker.

Recommendation

Apply a workaround by simplifying the LoRA configuration and testing it with a smaller model or a different TPU setup to isolate the issue and determine the root cause. This approach can help identify if the problem is specific to the current configuration or a more general compatibility issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING