vllm - 💡(How to fix) Fix [Usage]: HunyuanImage-3.0 text-to-image fails due to expert parallel group not initialized on Ascend NPU Description [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40452Fetched 2026-04-22 07:45:32
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
closed ×1commented ×1labeled ×1

Error Message

Error 1: expert parallel group not initialized Error 2: world group not initialized (after modifying get_ep_group() to return get_world_group()) Error 3: UnboundLocalError when trying to manually create _WORLD and _EP in parallel_state.py. Error 4: No backend type associated with device type npu (after creating dummy EP group) Error 5: Not support device type: npu (when running the official image_to_text.py example) The model can be started successfully if the stage is configured as stage_type: "llm" or model_stage: thinker (image-to-text mode). However, in this mode, calling /v1/images/generations returns an error: "No diffusion stage found in multi-stage pipeline." This confirms that the thinker mode bypasses the MoE initialization issue but cannot be used for text-to-image generation. (APIServer pid=xxxx) ERROR ... ValueError: Free memory on device (10.44/60.96 GiB) on startup is less than desired GPU memory utilization...

Root Cause

Possible Root Cause The vLLM-Ascend fused MoE implementation expects get_ep_group() to return a valid GroupCoordinator with NPU backend, but in pure diffusion mode the distributed environment is either not fully initialized or _EP is never created. The thinker mode avoids this path because it doesn't trigger the same MoE communication setup.

Fix Action

Fix / Workaround

Patching get_ep_group() in vllm/distributed/parallel_state.py to return get_world_group()

Request Please advise on the correct way to initialize the expert parallel group for HunyuanImage-3.0 on Ascend NPU, or provide a workaround to disable the EP requirement for pure diffusion mode. Alternatively, confirm if this is a known limitation.

RAW_BUFFERClick to expand / collapse

Your current environment

Hardware: Ascend NPU (8 cards, 60GB each) CANN version: 8.5.1 (V100R001C25SPC002B220) PyTorch version: 2.9.0+cpu docker image:quay.io/ascend/vllm-omni:v0.18.0 Model: Tencent-Hunyuan/HunyuanImage-3.0 (downloaded from HuggingFace/ModelScope) OS: Linux aarch64

How would you like to use vllm

Steps to Reproduce Clone and install vLLM, vLLM-Ascend, vLLM-Omni (latest main branches).

Download the model:

bash git lfs clone https://huggingface.co/Tencent-Hunyuan/HunyuanImage-3.0 Run the following command (pure diffusion mode):

bash vllm serve /path/to/HunyuanImage-3.0 --omni --port 8091 Or with a YAML config (stage_type: diffusion):

yaml stage_args:

  • stage_id: 0 stage_type: "diffusion" runtime: process: true devices: "0,1,2,3,4,5,6,7" engine_args: trust_remote_code: true enforce_eager: true model: /path/to/HunyuanImage-3.0 max_num_seqs: 1 gpu_memory_utilization: 0.9 tensor_parallel_size: 8 final_output: true final_output_type: image Then run:

bash vllm serve /path/to/HunyuanImage-3.0 --omni --port 8091 --stage-configs-path config.yaml Expected Behavior The service starts successfully and the /v1/images/generations endpoint works.

Actual Behavior The service crashes with one of the following errors (depending on the attempt):

Error 1: expert parallel group not initialized

text AssertionError: expert parallel group is not initialized. EP group is only created for MoE models with num_experts > 0. This function should only be called for MoE models. Error 2: world group not initialized (after modifying get_ep_group() to return get_world_group())

text AssertionError: world group is not initialized Error 3: UnboundLocalError when trying to manually create _WORLD and _EP in parallel_state.py.

Error 4: No backend type associated with device type npu (after creating dummy EP group)

text RuntimeError: No backend type associated with device type npu Error 5: Not support device type: npu (when running the official image_to_text.py example)

text RuntimeError: Not support device type: npu Additional Context The model can be started successfully if the stage is configured as stage_type: "llm" or model_stage: thinker (image-to-text mode). However, in this mode, calling /v1/images/generations returns an error: "No diffusion stage found in multi-stage pipeline." This confirms that the thinker mode bypasses the MoE initialization issue but cannot be used for text-to-image generation.

The same hardware and vLLM stack work fine for non-MoE models like Qwen-Image.

Attempted solutions include:

Modifying model config.json (num_experts=0, is_moe=false, num_local_experts=0)

Adding --enable-expert-parallel flag

Using YAML with stage_type: "diffusion" and without enable_expert_parallel

Patching get_ep_group() in vllm/distributed/parallel_state.py to return get_world_group()

Manually initializing _WORLD and _EP in parallel_state.py

Setting distributed_executor_backend="mp" in YAML

Lowering tensor_parallel_size to 1

None of these resolved the issue. The core problem seems to be that the diffusion MoE layers in vllm_ascend/ops/fused_moe/ strictly require a valid ep_group with proper NPU backend, which is not being created in pure diffusion mode.

Logs (excerpt) text (APIServer pid=xxxx) ERROR ... ValueError: Free memory on device (10.44/60.96 GiB) on startup is less than desired GPU memory utilization... (Worker_TPx_EPx pid=xxxx) ... AssertionError: expert parallel group is not initialized. Full logs available upon request.

Possible Root Cause The vLLM-Ascend fused MoE implementation expects get_ep_group() to return a valid GroupCoordinator with NPU backend, but in pure diffusion mode the distributed environment is either not fully initialized or _EP is never created. The thinker mode avoids this path because it doesn't trigger the same MoE communication setup.

Request Please advise on the correct way to initialize the expert parallel group for HunyuanImage-3.0 on Ascend NPU, or provide a workaround to disable the EP requirement for pure diffusion mode. Alternatively, confirm if this is a known limitation.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix involves initializing the expert parallel group correctly for the HunyuanImage-3.0 model on Ascend NPU, potentially by modifying the get_ep_group() function or configuring the distributed environment properly.

Guidance

  • Verify that the get_ep_group() function is correctly returning a valid GroupCoordinator with NPU backend, as expected by the vLLM-Ascend fused MoE implementation.
  • Check the distributed environment initialization and ensure that _EP is created properly in pure diffusion mode.
  • Consider modifying the model configuration or the get_ep_group() function to bypass the EP requirement for pure diffusion mode, if possible.
  • Review the logs to identify any memory utilization issues that may be contributing to the problem.

Example

No code snippet is provided due to the complexity of the issue and the need for more specific information about the get_ep_group() function and the distributed environment.

Notes

The issue seems to be specific to the HunyuanImage-3.0 model and the Ascend NPU hardware, and may require a customized solution. The provided attempted solutions did not resolve the issue, suggesting that a more in-depth analysis of the model and environment configuration is needed.

Recommendation

Apply a workaround by modifying the get_ep_group() function or configuring the distributed environment to correctly initialize the expert parallel group, as this is the most likely cause of the issue. This approach may require additional debugging and testing to ensure that the workaround does not introduce new problems.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Usage]: HunyuanImage-3.0 text-to-image fails due to expert parallel group not initialized on Ascend NPU Description [1 comments, 2 participants]