vllm - 💡(How to fix) Fix [Usage]: HunyuanImage-3.0 text-to-image fails due to expert parallel group not initialized on Ascend NPU Description [1 comments, 2 participants]

Error Message

Error 1: expert parallel group not initialized Error 2: world group not initialized (after modifying get_ep_group() to return get_world_group()) Error 3: UnboundLocalError when trying to manually create _WORLD and _EP in parallel_state.py. Error 4: No backend type associated with device type npu (after creating dummy EP group) Error 5: Not support device type: npu (when running the official image_to_text.py example) The model can be started successfully if the stage is configured as stage_type: "llm" or model_stage: thinker (image-to-text mode). However, in this mode, calling /v1/images/generations returns an error: "No diffusion stage found in multi-stage pipeline." This confirms that the thinker mode bypasses the MoE initialization issue but cannot be used for text-to-image generation. (APIServer pid=xxxx) ERROR ... ValueError: Free memory on device (10.44/60.96 GiB) on startup is less than desired GPU memory utilization...

Root Cause

Possible Root Cause The vLLM-Ascend fused MoE implementation expects get_ep_group() to return a valid GroupCoordinator with NPU backend, but in pure diffusion mode the distributed environment is either not fully initialized or _EP is never created. The thinker mode avoids this path because it doesn't trigger the same MoE communication setup.

Fix Action

Fix / Workaround

Patching get_ep_group() in vllm/distributed/parallel_state.py to return get_world_group()

Request Please advise on the correct way to initialize the expert parallel group for HunyuanImage-3.0 on Ascend NPU, or provide a workaround to disable the EP requirement for pure diffusion mode. Alternatively, confirm if this is a known limitation.

Your current environment

Hardware: Ascend NPU (8 cards, 60GB each) CANN version: 8.5.1 (V100R001C25SPC002B220) PyTorch version: 2.9.0+cpu docker image：quay.io/ascend/vllm-omni:v0.18.0 Model: Tencent-Hunyuan/HunyuanImage-3.0 (downloaded from HuggingFace/ModelScope) OS: Linux aarch64

How would you like to use vllm

Steps to Reproduce Clone and install vLLM, vLLM-Ascend, vLLM-Omni (latest main branches).

Download the model:

bash git lfs clone https://huggingface.co/Tencent-Hunyuan/HunyuanImage-3.0 Run the following command (pure diffusion mode):

bash vllm serve /path/to/HunyuanImage-3.0 --omni --port 8091 Or with a YAML config (stage_type: diffusion):

yaml stage_args:

stage_id: 0 stage_type: "diffusion" runtime: process: true devices: "0,1,2,3,4,5,6,7" engine_args: trust_remote_code: true enforce_eager: true model: /path/to/HunyuanImage-3.0 max_num_seqs: 1 gpu_memory_utilization: 0.9 tensor_parallel_size: 8 final_output: true final_output_type: image Then run:

bash vllm serve /path/to/HunyuanImage-3.0 --omni --port 8091 --stage-configs-path config.yaml Expected Behavior The service starts successfully and the /v1/images/generations endpoint works.

Actual Behavior The service crashes with one of the following errors (depending on the attempt):

Error 1: expert parallel group not initialized

text AssertionError: expert parallel group is not initialized. EP group is only created for MoE models with num_experts > 0. This function should only be called for MoE models. Error 2: world group not initialized (after modifying get_ep_group() to return get_world_group())

text AssertionError: world group is not initialized Error 3: UnboundLocalError when trying to manually create _WORLD and _EP in parallel_state.py.

Error 4: No backend type associated with device type npu (after creating dummy EP group)

text RuntimeError: No backend type associated with device type npu Error 5: Not support device type: npu (when running the official image_to_text.py example)

text RuntimeError: Not support device type: npu Additional Context The model can be started successfully if the stage is configured as stage_type: "llm" or model_stage: thinker (image-to-text mode). However, in this mode, calling /v1/images/generations returns an error: "No diffusion stage found in multi-stage pipeline." This confirms that the thinker mode bypasses the MoE initialization issue but cannot be used for text-to-image generation.

The same hardware and vLLM stack work fine for non-MoE models like Qwen-Image.

Attempted solutions include:

Modifying model config.json (num_experts=0, is_moe=false, num_local_experts=0)

Adding --enable-expert-parallel flag

Using YAML with stage_type: "diffusion" and without enable_expert_parallel

Patching get_ep_group() in vllm/distributed/parallel_state.py to return get_world_group()

Manually initializing _WORLD and _EP in parallel_state.py

Setting distributed_executor_backend="mp" in YAML

Lowering tensor_parallel_size to 1

None of these resolved the issue. The core problem seems to be that the diffusion MoE layers in vllm_ascend/ops/fused_moe/ strictly require a valid ep_group with proper NPU backend, which is not being created in pure diffusion mode.

Logs (excerpt) text (APIServer pid=xxxx) ERROR ... ValueError: Free memory on device (10.44/60.96 GiB) on startup is less than desired GPU memory utilization... (Worker_TPx_EPx pid=xxxx) ... AssertionError: expert parallel group is not initialized. Full logs available upon request.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix involves initializing the expert parallel group correctly for the HunyuanImage-3.0 model on Ascend NPU, potentially by modifying the get_ep_group() function or configuring the distributed environment properly.

Guidance

Verify that the get_ep_group() function is correctly returning a valid GroupCoordinator with NPU backend, as expected by the vLLM-Ascend fused MoE implementation.
Check the distributed environment initialization and ensure that _EP is created properly in pure diffusion mode.
Consider modifying the model configuration or the get_ep_group() function to bypass the EP requirement for pure diffusion mode, if possible.
Review the logs to identify any memory utilization issues that may be contributing to the problem.

Example

No code snippet is provided due to the complexity of the issue and the need for more specific information about the get_ep_group() function and the distributed environment.

Notes

The issue seems to be specific to the HunyuanImage-3.0 model and the Ascend NPU hardware, and may require a customized solution. The provided attempted solutions did not resolve the issue, suggesting that a more in-depth analysis of the model and environment configuration is needed.

Recommendation

Apply a workaround by modifying the get_ep_group() function or configuring the distributed environment to correctly initialize the expert parallel group, as this is the most likely cause of the issue. This approach may require additional debugging and testing to ensure that the workaround does not introduce new problems.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Usage]: HunyuanImage-3.0 text-to-image fails due to expert parallel group not initialized on Ascend NPU Description [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Your current environment

How would you like to use vllm

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Usage]: HunyuanImage-3.0 text-to-image fails due to expert parallel group not initialized on Ascend NPU Description [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Your current environment

How would you like to use vllm

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING