vllm - ✅(Solved) Fix [CI Failure]: Kernels FusedMoE Layer Test (2 H100s): test_moe_layer.py::test_moe_layer [2 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40637Fetched 2026-04-23 07:23:41
View on GitHub
Comments
2
Participants
2
Timeline
8
Reactions
0
Timeline (top)
commented ×2cross-referenced ×2added_to_project_v2 ×1closed ×1

Error Message

=================================================================== short test summary info ====================================================================

FAILED kernels/moe/test_moe_layer.py::test_moe_layer[False-deepep_high_throughput-2-1-True] - torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT FAILED kernels/moe/test_moe_layer.py::test_moe_layer[False-deepep_low_latency-2-1-True] - torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT ============================================= 2 failed, 183 passed, 335 skipped, 18 warnings in 687.34s (0:11:27) ==============================================

Root Cause

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

Fix Action

Fixed

PR fix notes

PR #35077: [Bugfix] LoRA for DeepSeek V3.2

Description (problem / solution / changelog)

Purpose

This PR fixes LoRA regressions seen with DeepSeek V3.2/DSA:

  1. LoRA module registration failed for fused_qkv_a_proj with an assertion that the module was not a BaseLayerWithLoRA.
  2. After that fix, MLA weight post-processing failed with AttributeError: 'ColumnParallelLinearWithLoRA' object has no attribute 'quant_method'.
   File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model
     return self.lora_manager.create_lora_manager(model, vllm_config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 227, in create_lora_manager
     lora_manager = create_lora_manager(
                    ^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 895, in create_lora_manager
     lora_manager = lora_manager_cls(
                    ^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 807, in __init__
     super().__init__(
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 111, in __init__
     self._create_lora_modules()
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 407, in _create_lora_modules
     self.register_module(module_name, new_module)
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 414, in register_module
     assert isinstance(module, BaseLayerWithLoRA), (
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 AssertionError: Module model.layers.0.self_attn.fused_qkv_a_proj must be a BaseLayerWithLoRA instance, got <class 'vllm.model_executor.models.deepseek_v2.DeepSeekV2FusedQkvAProj'>
File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
     output = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
   File "/mnt/data/user/songlin/verl/verl/workers/rollout/vllm_rollout/utils.py", line 273, in update_weights_from_ipc
     process_weights_after_loading(model, model_config, self.device)
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 117, in process_weights_after_loading
     module.process_weights_after_loading(model_config.dtype)
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/layers/attention/mla_attention.py", line 655, in process_weights_after_loading
     kv_b_proj_weight = get_and_maybe_dequant_weights(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/utils/quant_utils.py", line 333, in get_and_maybe_dequant_weights
     if layer.quant_method is None or isinstance(
        ^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1965, in __getattr__
     raise AttributeError(
 AttributeError: 'ColumnParallelLinearWithLoRA' object has no attribute 'quant_method'
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 771, in worker_main
    worker = WorkerProc(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 597, in __init__
    self.worker.load_model()
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 336, in load_model
    self.model_runner.load_model(load_dummy_weights=dummy_weights)
  File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4222, in load_model
    self.model = self.load_lora_model(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model
    return self.lora_manager.create_lora_manager(model, vllm_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 227, in create_lora_manager
    lora_manager = create_lora_manager(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 895, in create_lora_manager
    lora_manager = lora_manager_cls(
                   ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 807, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 111, in __init__
    self._create_lora_modules()
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 407, in _create_lora_modules
    self.register_module(module_name, new_module)
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 414, in
    assert isinstance(module, BaseLayerWithLoRA), (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Module model.layers.3.mlp.gate must be a BaseLayerWithLoRA instance, got <class 'vllm.model_executor.layers.fused_moe.router.gate_linear.GateLinear'>

Test Plan

Added the unit test cases, and also with end to end test manually.

Test Result

All pass without the above error.


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

<sub>✨ Presented to you with <a href="https://macaron.im/mindlab">Mind Lab</a> - A Lab for Experiential Intelligence.</sub>

Changed files

  • tests/kernels/moe/test_moe_layer.py (modified, +7/-12)
  • tests/lora/test_layers.py (modified, +270/-2)
  • tests/lora/test_lora_manager.py (modified, +150/-0)
  • tests/lora/test_lora_utils.py (modified, +21/-0)
  • vllm/lora/layers/base_linear.py (modified, +9/-0)
  • vllm/lora/layers/column_parallel_linear.py (modified, +42/-10)
  • vllm/lora/layers/replicated_linear.py (modified, +7/-1)
  • vllm/lora/model_manager.py (modified, +52/-7)
  • vllm/lora/utils.py (modified, +25/-3)
  • vllm/lora/worker_manager.py (modified, +8/-1)
  • vllm/model_executor/layers/fused_moe/oracle/unquantized.py (modified, +13/-0)
  • vllm/model_executor/layers/quantization/utils/quant_utils.py (modified, +6/-0)
  • vllm/v1/worker/lora_model_runner_mixin.py (modified, +4/-1)

PR #40639: [CI Bug] Fix ci issue #40637, Kernels FusedMoE Layer Test (2 H100s): test_moe_layer.py::test_moe_layer

Description (problem / solution / changelog)

Purpose

Fix ci issue #40637, seems issue introduced from https://github.com/vllm-project/vllm/pull/35077

Test

Covered in CI

Changed files

  • tests/kernels/moe/test_moe_layer.py (modified, +12/-7)

Code Example

=================================================================== short test summary info ====================================================================
--
FAILED kernels/moe/test_moe_layer.py::test_moe_layer[False-deepep_high_throughput-2-1-True] - torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT
FAILED kernels/moe/test_moe_layer.py::test_moe_layer[False-deepep_low_latency-2-1-True] - torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT
============================================= 2 failed, 183 passed, 335 skipped, 18 warnings in 687.34s (0:11:27) ==============================================
RAW_BUFFERClick to expand / collapse

Name of failing test

FAILED kernels/moe/test_moe_layer.py::test_moe_layer[False-deepep_high_throughput-2-1-True] - torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT FAILED kernels/moe/test_moe_layer.py::test_moe_layer[False-deepep_low_latency-2-1-True] - torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

https://buildkite.com/vllm/ci/builds/62456#019db5a5-fc65-4fe3-bcd4-62ead4870367


=================================================================== short test summary info ====================================================================
--
FAILED kernels/moe/test_moe_layer.py::test_moe_layer[False-deepep_high_throughput-2-1-True] - torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT
FAILED kernels/moe/test_moe_layer.py::test_moe_layer[False-deepep_low_latency-2-1-True] - torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT
============================================= 2 failed, 183 passed, 335 skipped, 18 warnings in 687.34s (0:11:27) ==============================================

📝 History of failing test

Since happened yesterday

CC List.

https://buildkite.com/vllm/ci/builds?branch=main&query=nightly Only today, but not happened yesterday

extent analysis

TL;DR

Investigate the torch.multiprocessing.spawn functionality and its interaction with the test environment to resolve the ProcessExitedException issue.

Guidance

  • Review the test configuration and environment to ensure that torch.multiprocessing.spawn is properly set up and compatible with the test framework.
  • Check the Buildkite CI build logs for any additional error messages or warnings that may indicate the root cause of the SIGABRT signal.
  • Investigate potential issues with the test_moe_layer.py test case, such as resource constraints or incorrect test data, that may be contributing to the process termination.
  • Consider running the test locally to reproduce the issue and gather more detailed debugging information.

Example

No specific code snippet can be provided without more context, but reviewing the test_moe_layer.py test case and the torch.multiprocessing.spawn documentation may help identify potential issues.

Notes

The issue may be related to a specific combination of test parameters or environment settings, and further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by modifying the test configuration or environment to avoid the ProcessExitedException issue, as the root cause is not yet clear.

FAIL-SAFE

If the issue persists, consider reaching out to the PyTorch or Buildkite communities for additional support and guidance.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING