vllm - ✅(Solved) Fix [Bug]: RuntimeError: 基于megatron grpo Qwen3-Omni模型时,出现RuntimeError: Event device index does not match recording stream's device index [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37659Fetched 2026-04-08 01:04:17
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
renamed ×3cross-referenced ×1labeled ×1referenced ×1

Error Message

[rank3]: Traceback (most recent call last): [rank3]: File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module> [rank3]: megatron_rlhf_main() [rank3]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main [rank3]: return MegatronRLHF(args).main() [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main [rank3]: result = self.run() [rank3]: ^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run [rank3]: trainer = self.prepare_trainer() [rank3]: ^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer [rank3]: return trainer_cls(args, self.template, **kwargs) [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in init [rank3]: self._init_rollout_engine() [rank3]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine [rank3]: super()._init_rollout_engine() [rank3]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine [rank3]: self.engine = self._prepare_vllm_engine() [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine [rank3]: engine = GRPOVllmEngine( [rank3]: ^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in init [rank3]: self._prepare_engine() [rank3]: File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine [rank3]: engine = llm_engine_cls.from_engine_args(self.engine_args) [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args [rank3]: return cls( [rank3]: ^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in init [rank3]: self.engine_core = EngineCoreClient.make_client( [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client [rank3]: return InprocClient(vllm_config, executor_class, log_stats) [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in init [rank3]: self.engine_core = EngineCore(*args, **kwargs) [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in init [rank3]: num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper [rank3]: return func(*args, **kwargs) [rank3]: ^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches [rank3]: self.model_executor.initialize_from_config(kv_cache_configs) [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config [rank3]: compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model") [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc [rank3]: result = run_method(self.driver_worker, method, args, kwargs) [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method [rank3]: return func(*args, **kwargs) [rank3]: ^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper [rank3]: return func(*args, **kwargs) [rank3]: ^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model [rank3]: kernel_warmup(self) [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup [rank3]: flashinfer_autotune(worker.model_runner) [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune [rank3]: runner._dummy_run( [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context [rank3]: return func(*args, **kwargs) [rank3]: ^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run [rank3]: with self.synchronize_input_prep(): [rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in exit [rank3]: next(self.gen) [rank3]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep [rank3]: self.prepare_inputs_event.record() [rank3]: RuntimeError: Event device index does not match recording stream's device index . [rank4]: Traceback (most recent call last): [rank4]: File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module> [rank4]: megatron_rlhf_main() [rank4]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main [rank4]: return MegatronRLHF(args).main() [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main [rank4]: result = self.run() [rank4]: ^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run [rank4]: trainer = self.prepare_trainer() [rank4]: ^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer [rank4]: return trainer_cls(args, self.template, **kwargs) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in init [rank4]: self._init_rollout_engine() [rank4]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine [rank4]: super()._init_rollout_engine() [rank4]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine [rank4]: self.engine = self._prepare_vllm_engine() [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine [rank4]: engine = GRPOVllmEngine( [rank4]: ^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in init [rank4]: self._prepare_engine() [rank4]: File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine [rank4]: engine = llm_engine_cls.from_engine_args(self.engine_args) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args [rank4]: return cls( [rank4]: ^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in init [rank4]: self.engine_core = EngineCoreClient.make_client( [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client [rank4]: return InprocClient(vllm_config, executor_class, log_stats) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in init [rank4]: self.engine_core = EngineCore(*args, **kwargs) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in init [rank4]: num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper [rank4]: return func(*args, **kwargs) [rank4]: ^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches [rank4]: self.model_executor.initialize_from_config(kv_cache_configs) [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config [rank4]: compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model") [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc [rank4]: result = run_method(self.driver_worker, method, args, kwargs) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method [rank4]: return func(*args, **kwargs) [rank4]: ^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper [rank4]: return func(*args, **kwargs) [rank4]: ^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model [rank4]: kernel_warmup(self) [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup [rank4]: flashinfer_autotune(worker.model_runner) [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune [rank4]: runner._dummy_run( [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context [rank4]: return func(*args, **kwargs) [rank4]: ^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run [rank4]: with self.synchronize_input_prep(): [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in exit [rank4]: next(self.gen) [rank4]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep [rank4]: self.prepare_inputs_event.record() [rank4]: RuntimeError: Event device index does not match recording stream's device index [rank5]: Traceback (most recent call last): [rank5]: File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module> [rank5]: megatron_rlhf_main() [rank5]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main [rank5]: return MegatronRLHF(args).main() [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main [rank5]: result = self.run() [rank5]: ^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run [rank5]: trainer = self.prepare_trainer() [rank5]: ^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer [rank5]: return trainer_cls(args, self.template, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in init [rank5]: self._init_rollout_engine() [rank5]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine [rank5]: super()._init_rollout_engine() [rank5]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine [rank5]: self.engine = self._prepare_vllm_engine() [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine [rank5]: engine = GRPOVllmEngine( [rank5]: ^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in init [rank5]: self._prepare_engine() [rank5]: File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine [rank5]: engine = llm_engine_cls.from_engine_args(self.engine_args) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args [rank5]: return cls( [rank5]: ^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in init [rank5]: self.engine_core = EngineCoreClient.make_client( [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client [rank5]: return InprocClient(vllm_config, executor_class, log_stats) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in init [rank5]: self.engine_core = EngineCore(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in init [rank5]: num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper [rank5]: return func(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches [rank5]: self.model_executor.initialize_from_config(kv_cache_configs) [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config [rank5]: compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model") [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc [rank5]: result = run_method(self.driver_worker, method, args, kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method [rank5]: return func(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper [rank5]: return func(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model [rank5]: kernel_warmup(self) [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup [rank5]: flashinfer_autotune(worker.model_runner) [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune [rank5]: runner._dummy_run( [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context [rank5]: return func(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run [rank5]: with self.synchronize_input_prep(): [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in exit [rank5]: next(self.gen) [rank5]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep [rank5]: self.prepare_inputs_event.record() [rank5]: RuntimeError: Event device index does not match recording stream's device index . [rank2]: Traceback (most recent call last): [rank2]: File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module> [rank2]: megatron_rlhf_main() [rank2]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main [rank2]: return MegatronRLHF(args).main() [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main [rank2]: result = self.run() [rank2]: ^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run [rank2]: trainer = self.prepare_trainer() [rank2]: ^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer [rank2]: return trainer_cls(args, self.template, **kwargs) [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in init [rank2]: self._init_rollout_engine() [rank2]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine [rank2]: super()._init_rollout_engine() [rank2]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine [rank2]: self.engine = self._prepare_vllm_engine() [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine [rank2]: engine = GRPOVllmEngine( [rank2]: ^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in init [rank2]: self._prepare_engine() [rank2]: File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine [rank2]: engine = llm_engine_cls.from_engine_args(self.engine_args) [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args [rank2]: return cls( [rank2]: ^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in init [rank2]: self.engine_core = EngineCoreClient.make_client( [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client [rank2]: return InprocClient(vllm_config, executor_class, log_stats) [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in init [rank2]: self.engine_core = EngineCore(*args, **kwargs) [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in init [rank2]: num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper [rank2]: return func(*args, **kwargs) [rank2]: ^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches [rank2]: self.model_executor.initialize_from_config(kv_cache_configs) [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config [rank2]: compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model") [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc [rank2]: result = run_method(self.driver_worker, method, args, kwargs) [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method [rank2]: return func(*args, **kwargs) [rank2]: ^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper [rank2]: return func(*args, **kwargs) [rank2]: ^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model [rank2]: kernel_warmup(self) [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup [rank2]: flashinfer_autotune(worker.model_runner) [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune [rank2]: runner._dummy_run( [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context [rank2]: return func(*args, **kwargs) [rank2]: ^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run [rank2]: with self.synchronize_input_prep(): [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in exit [rank2]: next(self.gen) [rank2]: File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep [rank2]: self.prepare_inputs_event.record() [rank2]: RuntimeError: Event device index does not match recording stream's device index INFO 03-20 16:30:09 [gpu_model_runner.py:5386] Graph capturing finished in 3 secs, took 0.19 GiB INFO 03-20 16:30:09 [core.py:282] init engine (profile, create kv cache, warmup model) took 22.46 seconds INFO 03-20 16:30:10 [block_pool.py:472] Successfully reset prefix cache INFO 03-20 16:30:10 [block_pool.py:472] Successfully reset prefix cache [rank3]:[W320 16:30:11.295658005 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) [rank4]:[W320 16:30:11.526257707 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) [rank2]:[W320 16:30:11.630842358 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) [rank5]:[W320 16:30:11.736837613 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) pure virtual method called terminate called without an active exception W0320 16:30:13.410000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123531 closing signal SIGTERM W0320 16:30:13.412000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123532 closing signal SIGTERM W0320 16:30:13.414000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123533 closing signal SIGTERM W0320 16:30:13.416000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123535 closing signal SIGTERM W0320 16:30:13.416000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123536 closing signal SIGTERM /home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' E0320 16:30:16.899000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:984] failed (exitcode: -6) local_rank: 3 (pid: 4123534) of binary: /home/appuser/lhd/miniconda/envs/metron312/bin/python3.12 Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 995, in <module> main() File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 362, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 991, in main run(args) File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 982, in run elastic_launch( File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 170, in call return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 317, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py FAILED

Failures: [1]: time : 2026-03-20_16:30:16 host : domainagent-ai8 rank : 0 (local_rank: 0) exitcode : -15 (pid: 4123531) error_file: <N/A> traceback : Signal 15 (SIGTERM) received by PID 4123531 [2]: time : 2026-03-20_16:30:16 host : domainagent-ai8 rank : 1 (local_rank: 1) exitcode : -15 (pid: 4123532) error_file: <N/A> traceback : Signal 15 (SIGTERM) received by PID 4123532 [3]: time : 2026-03-20_16:30:16 host : domainagent-ai8 rank : 2 (local_rank: 2) exitcode : -15 (pid: 4123533) error_file: <N/A> traceback : Signal 15 (SIGTERM) received by PID 4123533 [4]: time : 2026-03-20_16:30:16 host : domainagent-ai8 rank : 4 (local_rank: 4) exitcode : -15 (pid: 4123535) error_file: <N/A> traceback : Signal 15 (SIGTERM) received by PID 4123535 [5]: time : 2026-03-20_16:30:16 host : domainagent-ai8 rank : 5 (local_rank: 5) exitcode : -15 (pid: 4123536) error_file: <N/A> traceback : Signal 15 (SIGTERM) received by PID 4123536

Root Cause (first observed failure): [0]: time : 2026-03-20_16:30:13 host : domainagent-ai8 rank : 3 (local_rank: 3) exitcode : -6 (pid: 4123534) error_file: <N/A> traceback : Signal 6 (SIGABRT) received by PID 4123534

Root Cause

基于megatron grpo Qwen3-Omni模型时,出现此错误,请问这是什么原因呢?

[rank3]: Traceback (most recent call last):
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank3]:     megatron_rlhf_main()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank3]:     return MegatronRLHF(args).main()
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank3]:     result = self.run()
[rank3]:              ^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank3]:     trainer = self.prepare_trainer()
[rank3]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank3]:     return trainer_cls(args, self.template, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank3]:     self._init_rollout_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank3]:     super()._init_rollout_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank3]:     self.engine = self._prepare_vllm_engine()
[rank3]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank3]:     engine = GRPOVllmEngine(
[rank3]:              ^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank3]:     self._prepare_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank3]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank3]:     return cls(
[rank3]:            ^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank3]:     self.engine_core = EngineCoreClient.make_client(
[rank3]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank3]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank3]:     self.engine_core = EngineCore(*args, **kwargs)
[rank3]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank3]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank3]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank3]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank3]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank3]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank3]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank3]:     kernel_warmup(self)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank3]:     flashinfer_autotune(worker.model_runner)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank3]:     runner._dummy_run(
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank3]:     with self.synchronize_input_prep():
[rank3]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank3]:     next(self.gen)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank3]:     self.prepare_inputs_event.record()
[rank3]: RuntimeError: Event device index  does not match recording stream's device index .
[rank4]: Traceback (most recent call last):
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank4]:     megatron_rlhf_main()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank4]:     return MegatronRLHF(args).main()
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank4]:     result = self.run()
[rank4]:              ^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank4]:     trainer = self.prepare_trainer()
[rank4]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank4]:     return trainer_cls(args, self.template, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank4]:     self._init_rollout_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank4]:     super()._init_rollout_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank4]:     self.engine = self._prepare_vllm_engine()
[rank4]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank4]:     engine = GRPOVllmEngine(
[rank4]:              ^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank4]:     self._prepare_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank4]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank4]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank4]:     return cls(
[rank4]:            ^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank4]:     self.engine_core = EngineCoreClient.make_client(
[rank4]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank4]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank4]:     self.engine_core = EngineCore(*args, **kwargs)
[rank4]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank4]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank4]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank4]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank4]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank4]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank4]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank4]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank4]:     kernel_warmup(self)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank4]:     flashinfer_autotune(worker.model_runner)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank4]:     runner._dummy_run(
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank4]:     with self.synchronize_input_prep():
[rank4]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank4]:     next(self.gen)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank4]:     self.prepare_inputs_event.record()
[rank4]: RuntimeError: Event device index  does not match recording stream's device index 
[rank5]: Traceback (most recent call last):
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank5]:     megatron_rlhf_main()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank5]:     return MegatronRLHF(args).main()
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank5]:     result = self.run()
[rank5]:              ^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank5]:     trainer = self.prepare_trainer()
[rank5]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank5]:     return trainer_cls(args, self.template, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank5]:     self._init_rollout_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank5]:     super()._init_rollout_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank5]:     self.engine = self._prepare_vllm_engine()
[rank5]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank5]:     engine = GRPOVllmEngine(
[rank5]:              ^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank5]:     self._prepare_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank5]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank5]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank5]:     return cls(
[rank5]:            ^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank5]:     self.engine_core = EngineCoreClient.make_client(
[rank5]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank5]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank5]:     self.engine_core = EngineCore(*args, **kwargs)
[rank5]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank5]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank5]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank5]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank5]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank5]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank5]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank5]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank5]:     kernel_warmup(self)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank5]:     flashinfer_autotune(worker.model_runner)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank5]:     runner._dummy_run(
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank5]:     with self.synchronize_input_prep():
[rank5]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank5]:     next(self.gen)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank5]:     self.prepare_inputs_event.record()
[rank5]: RuntimeError: Event device index  does not match recording stream's device index .
[rank2]: Traceback (most recent call last):
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank2]:     megatron_rlhf_main()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank2]:     return MegatronRLHF(args).main()
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank2]:     result = self.run()
[rank2]:              ^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank2]:     trainer = self.prepare_trainer()
[rank2]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank2]:     return trainer_cls(args, self.template, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank2]:     self._init_rollout_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank2]:     super()._init_rollout_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank2]:     self.engine = self._prepare_vllm_engine()
[rank2]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank2]:     engine = GRPOVllmEngine(
[rank2]:              ^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank2]:     self._prepare_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank2]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank2]:     return cls(
[rank2]:            ^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank2]:     self.engine_core = EngineCoreClient.make_client(
[rank2]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank2]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank2]:     self.engine_core = EngineCore(*args, **kwargs)
[rank2]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank2]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank2]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank2]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank2]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank2]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank2]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank2]:     kernel_warmup(self)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank2]:     flashinfer_autotune(worker.model_runner)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank2]:     runner._dummy_run(
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank2]:     with self.synchronize_input_prep():
[rank2]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank2]:     next(self.gen)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank2]:     self.prepare_inputs_event.record()
[rank2]: RuntimeError: Event device index  does not match recording stream's device index 
INFO 03-20 16:30:09 [gpu_model_runner.py:5386] Graph capturing finished in 3 secs, took 0.19 GiB
INFO 03-20 16:30:09 [core.py:282] init engine (profile, create kv cache, warmup model) took 22.46 seconds
INFO 03-20 16:30:10 [block_pool.py:472] Successfully reset prefix cache
INFO 03-20 16:30:10 [block_pool.py:472] Successfully reset prefix cache
[rank3]:[W320 16:30:11.295658005 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank4]:[W320 16:30:11.526257707 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank2]:[W320 16:30:11.630842358 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank5]:[W320 16:30:11.736837613 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
pure virtual method called
terminate called without an active exception
W0320 16:30:13.410000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123531 closing signal SIGTERM
W0320 16:30:13.412000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123532 closing signal SIGTERM
W0320 16:30:13.414000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123533 closing signal SIGTERM
W0320 16:30:13.416000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123535 closing signal SIGTERM
W0320 16:30:13.416000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123536 closing signal SIGTERM
/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
E0320 16:30:16.899000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:984] failed (exitcode: -6) local_rank: 3 (pid: 4123534) of binary: /home/appuser/lhd/miniconda/envs/metron312/bin/python3.12
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 995, in <module>
    main()
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 362, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 991, in main
    run(args)
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 982, in run
    elastic_launch(
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 170, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 317, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 0 (local_rank: 0)
  exitcode  : -15 (pid: 4123531)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123531
[2]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 1 (local_rank: 1)
  exitcode  : -15 (pid: 4123532)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123532
[3]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 2 (local_rank: 2)
  exitcode  : -15 (pid: 4123533)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123533
[4]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 4 (local_rank: 4)
  exitcode  : -15 (pid: 4123535)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123535
[5]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 5 (local_rank: 5)
  exitcode  : -15 (pid: 4123536)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123536
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2026-03-20_16:30:13
  host      : domainagent-ai8
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 4123534)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 4123534
============================================================

Before submitting a new issue...

Fix Action

Fixed

PR fix notes

PR #37670: fix: set device for prepare_inputs_event to avoid device mismatch

Description (problem / solution / changelog)

In the async scheduling path, prepare_inputs_event was created without specifying the device. When the event is recorded on a non-default stream or device, it can cause 'Event device index does not match recording stream device index' errors, especially in multi-GPU or TP scenarios.

Fix for issue #37659.

Purpose

Test Plan

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • vllm/v1/worker/gpu_model_runner.py (modified, +1/-1)

Code Example

export MEGATRON_LM_PATH="ms-swift/swift/megatron/Megatron-LM"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
export NPROC_PER_NODE=6
export PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True'
export WANDB_MODE=disabled
export MASTER_PORT=29510 
megatron rlhf \
    --rlhf_type grpo \
    --model /data/model/Qwen/Qwen3-Omni-30B-A3B-Instruct \
    --dataset data_shuffle_train_grpo.jsonl \
    --output_dir megatron_Qwen-Omni \
    --num_train_epochs 1 \
    --global_batch_size 6 \
    --micro_batch_size 1 \
    --steps_per_generation 1 \
    --num_generations 2 \
    --reward_funcs accuracy format \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_tensor_parallel_size 2 \
    --vllm_gpu_memory_utilization 0.60 \
    --vllm_max_model_len 1024 \
    --max_length 1024 \
    --max_completion_length 512 \
    --tensor_model_parallel_size 2 \
    --pipeline_model_parallel_size 3 \
    --context_parallel_size 1 \
    --expert_model_parallel_size 1 \
    --tuner_type lora \
    --lr 5e-5 \
    --bf16 true \
    --beta 0.00 \
    --importance_sampling_level sequence \
    --epsilon 3e-4 \
    --epsilon_high 4e-4 \
    --dynamic_sample false \
    --overlong_filter true \
    --loss_type grpo \
    --sleep_level 1 \
    --offload_model true \
    --offload_bridge true \
    --offload_optimizer true \
    --logging_steps 1 \
    --recompute_granularity selective \
    --finetune \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --no_save_optim \
    --no_save_rng \
    --attention_backend flash \
    --temperature 1.0 \
    --padding_free false \
    --sequence_parallel true \
    --log_completions true

---

[rank3]: Traceback (most recent call last):
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank3]:     megatron_rlhf_main()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank3]:     return MegatronRLHF(args).main()
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank3]:     result = self.run()
[rank3]:              ^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank3]:     trainer = self.prepare_trainer()
[rank3]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank3]:     return trainer_cls(args, self.template, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank3]:     self._init_rollout_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank3]:     super()._init_rollout_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank3]:     self.engine = self._prepare_vllm_engine()
[rank3]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank3]:     engine = GRPOVllmEngine(
[rank3]:              ^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank3]:     self._prepare_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank3]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank3]:     return cls(
[rank3]:            ^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank3]:     self.engine_core = EngineCoreClient.make_client(
[rank3]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank3]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank3]:     self.engine_core = EngineCore(*args, **kwargs)
[rank3]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank3]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank3]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank3]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank3]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank3]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank3]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank3]:     kernel_warmup(self)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank3]:     flashinfer_autotune(worker.model_runner)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank3]:     runner._dummy_run(
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank3]:     with self.synchronize_input_prep():
[rank3]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank3]:     next(self.gen)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank3]:     self.prepare_inputs_event.record()
[rank3]: RuntimeError: Event device index  does not match recording stream's device index .
[rank4]: Traceback (most recent call last):
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank4]:     megatron_rlhf_main()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank4]:     return MegatronRLHF(args).main()
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank4]:     result = self.run()
[rank4]:              ^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank4]:     trainer = self.prepare_trainer()
[rank4]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank4]:     return trainer_cls(args, self.template, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank4]:     self._init_rollout_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank4]:     super()._init_rollout_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank4]:     self.engine = self._prepare_vllm_engine()
[rank4]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank4]:     engine = GRPOVllmEngine(
[rank4]:              ^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank4]:     self._prepare_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank4]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank4]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank4]:     return cls(
[rank4]:            ^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank4]:     self.engine_core = EngineCoreClient.make_client(
[rank4]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank4]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank4]:     self.engine_core = EngineCore(*args, **kwargs)
[rank4]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank4]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank4]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank4]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank4]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank4]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank4]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank4]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank4]:     kernel_warmup(self)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank4]:     flashinfer_autotune(worker.model_runner)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank4]:     runner._dummy_run(
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank4]:     with self.synchronize_input_prep():
[rank4]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank4]:     next(self.gen)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank4]:     self.prepare_inputs_event.record()
[rank4]: RuntimeError: Event device index  does not match recording stream's device index 
[rank5]: Traceback (most recent call last):
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank5]:     megatron_rlhf_main()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank5]:     return MegatronRLHF(args).main()
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank5]:     result = self.run()
[rank5]:              ^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank5]:     trainer = self.prepare_trainer()
[rank5]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank5]:     return trainer_cls(args, self.template, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank5]:     self._init_rollout_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank5]:     super()._init_rollout_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank5]:     self.engine = self._prepare_vllm_engine()
[rank5]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank5]:     engine = GRPOVllmEngine(
[rank5]:              ^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank5]:     self._prepare_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank5]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank5]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank5]:     return cls(
[rank5]:            ^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank5]:     self.engine_core = EngineCoreClient.make_client(
[rank5]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank5]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank5]:     self.engine_core = EngineCore(*args, **kwargs)
[rank5]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank5]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank5]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank5]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank5]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank5]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank5]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank5]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank5]:     kernel_warmup(self)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank5]:     flashinfer_autotune(worker.model_runner)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank5]:     runner._dummy_run(
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank5]:     with self.synchronize_input_prep():
[rank5]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank5]:     next(self.gen)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank5]:     self.prepare_inputs_event.record()
[rank5]: RuntimeError: Event device index  does not match recording stream's device index .
[rank2]: Traceback (most recent call last):
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank2]:     megatron_rlhf_main()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank2]:     return MegatronRLHF(args).main()
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank2]:     result = self.run()
[rank2]:              ^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank2]:     trainer = self.prepare_trainer()
[rank2]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank2]:     return trainer_cls(args, self.template, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank2]:     self._init_rollout_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank2]:     super()._init_rollout_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank2]:     self.engine = self._prepare_vllm_engine()
[rank2]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank2]:     engine = GRPOVllmEngine(
[rank2]:              ^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank2]:     self._prepare_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank2]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank2]:     return cls(
[rank2]:            ^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank2]:     self.engine_core = EngineCoreClient.make_client(
[rank2]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank2]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank2]:     self.engine_core = EngineCore(*args, **kwargs)
[rank2]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank2]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank2]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank2]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank2]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank2]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank2]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank2]:     kernel_warmup(self)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank2]:     flashinfer_autotune(worker.model_runner)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank2]:     runner._dummy_run(
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank2]:     with self.synchronize_input_prep():
[rank2]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank2]:     next(self.gen)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank2]:     self.prepare_inputs_event.record()
[rank2]: RuntimeError: Event device index  does not match recording stream's device index 
INFO 03-20 16:30:09 [gpu_model_runner.py:5386] Graph capturing finished in 3 secs, took 0.19 GiB
INFO 03-20 16:30:09 [core.py:282] init engine (profile, create kv cache, warmup model) took 22.46 seconds
INFO 03-20 16:30:10 [block_pool.py:472] Successfully reset prefix cache
INFO 03-20 16:30:10 [block_pool.py:472] Successfully reset prefix cache
[rank3]:[W320 16:30:11.295658005 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank4]:[W320 16:30:11.526257707 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank2]:[W320 16:30:11.630842358 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank5]:[W320 16:30:11.736837613 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
pure virtual method called
terminate called without an active exception
W0320 16:30:13.410000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123531 closing signal SIGTERM
W0320 16:30:13.412000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123532 closing signal SIGTERM
W0320 16:30:13.414000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123533 closing signal SIGTERM
W0320 16:30:13.416000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123535 closing signal SIGTERM
W0320 16:30:13.416000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123536 closing signal SIGTERM
/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
E0320 16:30:16.899000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:984] failed (exitcode: -6) local_rank: 3 (pid: 4123534) of binary: /home/appuser/lhd/miniconda/envs/metron312/bin/python3.12
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 995, in <module>
    main()
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 362, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 991, in main
    run(args)
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 982, in run
    elastic_launch(
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 170, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 317, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 0 (local_rank: 0)
  exitcode  : -15 (pid: 4123531)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123531
[2]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 1 (local_rank: 1)
  exitcode  : -15 (pid: 4123532)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123532
[3]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 2 (local_rank: 2)
  exitcode  : -15 (pid: 4123533)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123533
[4]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 4 (local_rank: 4)
  exitcode  : -15 (pid: 4123535)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123535
[5]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 5 (local_rank: 5)
  exitcode  : -15 (pid: 4123536)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123536
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2026-03-20_16:30:13
  host      : domainagent-ai8
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 4123534)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 4123534
============================================================
RAW_BUFFERClick to expand / collapse

Your current environment

torch 2.10.0 vllm 0.17.1 flash-attn 2.8.3 cuda 12.9

<details> <summary> 基于megatron grpo Qwen3-Omni模型时,出现此错误,请问这是什么原因呢?

export MEGATRON_LM_PATH="ms-swift/swift/megatron/Megatron-LM"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
export NPROC_PER_NODE=6
export PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True'
export WANDB_MODE=disabled
export MASTER_PORT=29510 
megatron rlhf \
    --rlhf_type grpo \
    --model /data/model/Qwen/Qwen3-Omni-30B-A3B-Instruct \
    --dataset data_shuffle_train_grpo.jsonl \
    --output_dir megatron_Qwen-Omni \
    --num_train_epochs 1 \
    --global_batch_size 6 \
    --micro_batch_size 1 \
    --steps_per_generation 1 \
    --num_generations 2 \
    --reward_funcs accuracy format \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_tensor_parallel_size 2 \
    --vllm_gpu_memory_utilization 0.60 \
    --vllm_max_model_len 1024 \
    --max_length 1024 \
    --max_completion_length 512 \
    --tensor_model_parallel_size 2 \
    --pipeline_model_parallel_size 3 \
    --context_parallel_size 1 \
    --expert_model_parallel_size 1 \
    --tuner_type lora \
    --lr 5e-5 \
    --bf16 true \
    --beta 0.00 \
    --importance_sampling_level sequence \
    --epsilon 3e-4 \
    --epsilon_high 4e-4 \
    --dynamic_sample false \
    --overlong_filter true \
    --loss_type grpo \
    --sleep_level 1 \
    --offload_model true \
    --offload_bridge true \
    --offload_optimizer true \
    --logging_steps 1 \
    --recompute_granularity selective \
    --finetune \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --no_save_optim \
    --no_save_rng \
    --attention_backend flash \
    --temperature 1.0 \
    --padding_free false \
    --sequence_parallel true \
    --log_completions true

🐛 Describe the bug

基于megatron grpo Qwen3-Omni模型时,出现此错误,请问这是什么原因呢?

[rank3]: Traceback (most recent call last):
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank3]:     megatron_rlhf_main()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank3]:     return MegatronRLHF(args).main()
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank3]:     result = self.run()
[rank3]:              ^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank3]:     trainer = self.prepare_trainer()
[rank3]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank3]:     return trainer_cls(args, self.template, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank3]:     self._init_rollout_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank3]:     super()._init_rollout_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank3]:     self.engine = self._prepare_vllm_engine()
[rank3]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank3]:     engine = GRPOVllmEngine(
[rank3]:              ^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank3]:     self._prepare_engine()
[rank3]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank3]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank3]:     return cls(
[rank3]:            ^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank3]:     self.engine_core = EngineCoreClient.make_client(
[rank3]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank3]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank3]:     self.engine_core = EngineCore(*args, **kwargs)
[rank3]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank3]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank3]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank3]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank3]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank3]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank3]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank3]:     kernel_warmup(self)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank3]:     flashinfer_autotune(worker.model_runner)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank3]:     runner._dummy_run(
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank3]:     with self.synchronize_input_prep():
[rank3]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank3]:     next(self.gen)
[rank3]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank3]:     self.prepare_inputs_event.record()
[rank3]: RuntimeError: Event device index  does not match recording stream's device index .
[rank4]: Traceback (most recent call last):
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank4]:     megatron_rlhf_main()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank4]:     return MegatronRLHF(args).main()
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank4]:     result = self.run()
[rank4]:              ^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank4]:     trainer = self.prepare_trainer()
[rank4]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank4]:     return trainer_cls(args, self.template, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank4]:     self._init_rollout_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank4]:     super()._init_rollout_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank4]:     self.engine = self._prepare_vllm_engine()
[rank4]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank4]:     engine = GRPOVllmEngine(
[rank4]:              ^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank4]:     self._prepare_engine()
[rank4]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank4]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank4]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank4]:     return cls(
[rank4]:            ^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank4]:     self.engine_core = EngineCoreClient.make_client(
[rank4]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank4]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank4]:     self.engine_core = EngineCore(*args, **kwargs)
[rank4]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank4]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank4]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank4]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank4]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank4]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank4]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank4]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank4]:     kernel_warmup(self)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank4]:     flashinfer_autotune(worker.model_runner)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank4]:     runner._dummy_run(
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank4]:     with self.synchronize_input_prep():
[rank4]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank4]:     next(self.gen)
[rank4]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank4]:     self.prepare_inputs_event.record()
[rank4]: RuntimeError: Event device index  does not match recording stream's device index 
[rank5]: Traceback (most recent call last):
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank5]:     megatron_rlhf_main()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank5]:     return MegatronRLHF(args).main()
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank5]:     result = self.run()
[rank5]:              ^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank5]:     trainer = self.prepare_trainer()
[rank5]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank5]:     return trainer_cls(args, self.template, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank5]:     self._init_rollout_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank5]:     super()._init_rollout_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank5]:     self.engine = self._prepare_vllm_engine()
[rank5]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank5]:     engine = GRPOVllmEngine(
[rank5]:              ^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank5]:     self._prepare_engine()
[rank5]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank5]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank5]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank5]:     return cls(
[rank5]:            ^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank5]:     self.engine_core = EngineCoreClient.make_client(
[rank5]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank5]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank5]:     self.engine_core = EngineCore(*args, **kwargs)
[rank5]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank5]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank5]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank5]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank5]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank5]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank5]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank5]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank5]:     kernel_warmup(self)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank5]:     flashinfer_autotune(worker.model_runner)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank5]:     runner._dummy_run(
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank5]:     with self.synchronize_input_prep():
[rank5]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank5]:     next(self.gen)
[rank5]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank5]:     self.prepare_inputs_event.record()
[rank5]: RuntimeError: Event device index  does not match recording stream's device index .
[rank2]: Traceback (most recent call last):
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py", line 7, in <module>
[rank2]:     megatron_rlhf_main()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 73, in megatron_rlhf_main
[rank2]:     return MegatronRLHF(args).main()
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/pipelines/base.py", line 52, in main
[rank2]:     result = self.run()
[rank2]:              ^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/sft.py", line 65, in run
[rank2]:     trainer = self.prepare_trainer()
[rank2]:               ^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/pipelines/train/rlhf.py", line 34, in prepare_trainer
[rank2]:     return trainer_cls(args, self.template, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 54, in __init__
[rank2]:     self._init_rollout_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/grpo_trainer.py", line 122, in _init_rollout_engine
[rank2]:     super()._init_rollout_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 222, in _init_rollout_engine
[rank2]:     self.engine = self._prepare_vllm_engine()
[rank2]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/megatron/trainers/rollout_mixin.py", line 248, in _prepare_vllm_engine
[rank2]:     engine = GRPOVllmEngine(
[rank2]:              ^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 161, in __init__
[rank2]:     self._prepare_engine()
[rank2]:   File "/home/appuser/lhd/ms-swift/swift/infer_engine/vllm_engine.py", line 183, in _prepare_engine
[rank2]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
[rank2]:     return cls(
[rank2]:            ^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
[rank2]:     self.engine_core = EngineCoreClient.make_client(
[rank2]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 100, in make_client
[rank2]:     return InprocClient(vllm_config, executor_class, log_stats)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 282, in __init__
[rank2]:     self.engine_core = EngineCore(*args, **kwargs)
[rank2]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
[rank2]:     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
[rank2]:                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
[rank2]:     self.model_executor.initialize_from_config(kv_cache_configs)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
[rank2]:     compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
[rank2]:                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc
[rank2]:     result = run_method(self.driver_worker, method, args, kwargs)
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 526, in compile_or_warm_up_model
[rank2]:     kernel_warmup(self)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 46, in kernel_warmup
[rank2]:     flashinfer_autotune(worker.model_runner)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 103, in flashinfer_autotune
[rank2]:     runner._dummy_run(
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4902, in _dummy_run
[rank2]:     with self.synchronize_input_prep():
[rank2]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/contextlib.py", line 144, in __exit__
[rank2]:     next(self.gen)
[rank2]:   File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3126, in synchronize_input_prep
[rank2]:     self.prepare_inputs_event.record()
[rank2]: RuntimeError: Event device index  does not match recording stream's device index 
INFO 03-20 16:30:09 [gpu_model_runner.py:5386] Graph capturing finished in 3 secs, took 0.19 GiB
INFO 03-20 16:30:09 [core.py:282] init engine (profile, create kv cache, warmup model) took 22.46 seconds
INFO 03-20 16:30:10 [block_pool.py:472] Successfully reset prefix cache
INFO 03-20 16:30:10 [block_pool.py:472] Successfully reset prefix cache
[rank3]:[W320 16:30:11.295658005 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank4]:[W320 16:30:11.526257707 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank2]:[W320 16:30:11.630842358 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank5]:[W320 16:30:11.736837613 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
pure virtual method called
terminate called without an active exception
W0320 16:30:13.410000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123531 closing signal SIGTERM
W0320 16:30:13.412000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123532 closing signal SIGTERM
W0320 16:30:13.414000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123533 closing signal SIGTERM
W0320 16:30:13.416000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123535 closing signal SIGTERM
W0320 16:30:13.416000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 4123536 closing signal SIGTERM
/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
E0320 16:30:16.899000 4123455 site-packages/torch/distributed/elastic/multiprocessing/api.py:984] failed (exitcode: -6) local_rank: 3 (pid: 4123534) of binary: /home/appuser/lhd/miniconda/envs/metron312/bin/python3.12
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 995, in <module>
    main()
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 362, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 991, in main
    run(args)
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/run.py", line 982, in run
    elastic_launch(
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 170, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/appuser/lhd/miniconda/envs/metron312/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 317, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/home/appuser/lhd/ms-swift/swift/cli/_megatron/rlhf.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 0 (local_rank: 0)
  exitcode  : -15 (pid: 4123531)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123531
[2]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 1 (local_rank: 1)
  exitcode  : -15 (pid: 4123532)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123532
[3]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 2 (local_rank: 2)
  exitcode  : -15 (pid: 4123533)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123533
[4]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 4 (local_rank: 4)
  exitcode  : -15 (pid: 4123535)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123535
[5]:
  time      : 2026-03-20_16:30:16
  host      : domainagent-ai8
  rank      : 5 (local_rank: 5)
  exitcode  : -15 (pid: 4123536)
  error_file: <N/A>
  traceback : Signal 15 (SIGTERM) received by PID 4123536
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2026-03-20_16:30:13
  host      : domainagent-ai8
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 4123534)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 4123534
============================================================

Before submitting a new issue...

  • #37660

extent analysis

Fix Plan

The error message indicates a mismatch between the event device index and the recording stream's device index. This issue is likely related to the use of multiple GPUs and the flash-attn library. To fix this, you can try the following steps:

  • Disable flash-attn: Try disabling flash-attn by setting --attention-backend to torch instead of flash:

--attention-backend torch

*   **Set `CUDA_VISIBLE_DEVICES`**: Ensure that `CUDA_VISIBLE_DEVICES` is set correctly to include all the GPUs you want to use:
    ```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
  • Check GPU memory: Verify that your GPUs have sufficient memory to run the model. You can check the GPU memory usage using nvidia-smi.
  • Update vllm and flash-attn: Make sure you are using the latest versions of vllm and flash-attn. You can update them using pip:

pip install --upgrade vllm flash-attn


### Code Changes
No specific code changes are required for this issue. However, you may need to modify your script to include the above-mentioned flags and environment variables.

### Verification
To verify that the fix worked, run your script again with the modified flags and environment variables. If the issue persists, you can try to debug further by adding print statements or using a debugger to inspect the values of the event device index and the recording stream's device index.

### Extra Tips
*   Ensure that your GPUs are properly installed and configured.
*   Check the documentation for `vllm` and `flash-attn` for any specific requirements or recommendations for running on multiple GPUs.
*   If you are using a cluster or distributed environment, ensure that the GPUs are properly configured and accessible across all nodes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: RuntimeError: 基于megatron grpo Qwen3-Omni模型时,出现RuntimeError: Event device index does not match recording stream's device index [1 pull requests, 1 participants]