vllm - 💡(How to fix) Fix [Bug]: gcc: internal compiler error: Segmentation fault signal terminated program cc1 [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37367Fetched 2026-04-08 00:53:14
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Error Message

gcc: internal compiler error: Segmentation fault signal terminated program cc1 (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] EngineCore failed to start. (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] Traceback (most recent call last): (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] super().init( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 252, in _initialize_kv_caches (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context

Root Cause

报错 (APIServer pid=7) INFO 03-17 11:10:14 [utils.py:238] non-default args: {'model_tag': './PaddleOCR-VL', 'host': '0.0.0.0', 'port': 18023, 'model': './PaddleOCR-VL', 'trust_remote_code': True, 'served_model_name': ['PaddleOCR-VL-0.9B'], 'gpu_memory_utilization': 0.3, 'enable_prefix_caching': False, 'mm_processor_cache_gb': 0.0} (APIServer pid=7) WARNING 03-17 11:10:14 [envs.py:1710] Unknown vLLM environment variable detected: VLLM_DISABLE_TRITON (APIServer pid=7) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=7) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=7) INFO 03-17 11:10:14 [model.py:531] Resolved architecture: PaddleOCRVLForConditionalGeneration (APIServer pid=7) INFO 03-17 11:10:14 [model.py:1554] Using max model len 131072 (APIServer pid=7) INFO 03-17 11:10:14 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048. (APIServer pid=7) INFO 03-17 11:10:14 [vllm.py:747] Asynchronous scheduling is enabled. (APIServer pid=7) Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. (EngineCore_DP0 pid=275) INFO 03-17 11:10:25 [core.py:101] Initializing a V1 LLM engine (v0.17.1) with config: model='./PaddleOCR-VL', speculative_config=None, tokenizer='./PaddleOCR-VL', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=PaddleOCR-VL-0.9B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore_DP0 pid=275) INFO 03-17 11:10:26 [parallel_state.py:1393] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.51.51:42717 backend=nccl [W317 11:10:36.233467487 socket.cpp:207] [c10d] The hostname of the client socket cannot be retrieved. err=-3 [rank0]:[W317 11:10:46.245054413 ProcessGroupGloo.cpp:511] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) (EngineCore_DP0 pid=275) INFO 03-17 11:11:36 [parallel_state.py:1715] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A (EngineCore_DP0 pid=275) Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [base.py:106] Offloader set to NoopOffloader (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [gpu_model_runner.py:4281] Starting to load model ./PaddleOCR-VL... (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [cuda.py:453] Using backend AttentionBackendEnum.FLASH_ATTN for vit attention (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [mm_encoder_attention.py:215] Using AttentionBackendEnum.FLASH_ATTN for MMEncoderAttention. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [cuda.py:405] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION']. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [flash_attn.py:587] Using FlashAttention version 2 (EngineCore_DP0 pid=275) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (EngineCore_DP0 pid=275) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.14it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.14it/s] (EngineCore_DP0 pid=275) (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [default_loader.py:293] Loading weights took 0.34 seconds (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [gpu_model_runner.py:4364] Model loading took 1.89 GiB memory and 0.770836 seconds (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [gpu_model_runner.py:5280] Encoder cache will be initialized with a budget of 3600 tokens, and profiled with 1 image items of the maximum feature size. gcc: internal compiler error: Segmentation fault signal terminated program cc1 Please submit a full bug report, with preprocessed source if appropriate. See file:///usr/share/doc/gcc-11/README.Bugs for instructions. (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] EngineCore failed to start. (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] Traceback (most recent call last): (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] super().init( (EngineCore_DP0 pid=275) Process EngineCore_DP0: (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 252, in _initialize_kv_caches (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 390, in determine_available_memory (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.model_runner.profile_run() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5296, in profile_run (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] dummy_encoder_outputs = self.model.embed_multimodal( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1242, in embed_multimodal (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] image_embeds = self._process_image_input(image_input) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1229, in _process_image_input (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] vision_outputs = tuple( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1230, in <genexpr> (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.encode_image(pixel, grid).squeeze(0) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1215, in encode_image (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] vision_outputs = self.visual( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 914, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self.vision_model( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 866, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] last_hidden_state = self.encoder( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 821, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] hidden_states = encoder_layer( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 712, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] hidden_states = self.self_attn( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 634, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] qk_rotated = self.apply_rotary_emb( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 129, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._forward_method(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding/common.py", line 244, in forward_cuda (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] output = apply_rotary_emb(x, cos, sin, interleaved) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return ApplyRotaryEmb.apply( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 583, in apply (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return super().apply(*args, **kwargs) # type: ignore[misc] (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] out = apply_rotary( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] rotary_kernel[grid]( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] device = driver.active.get_current_device() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self._active = self.default (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self._default = _create_driver() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return active_drivers0 (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] mod = compile_module_from_src( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpbm3nyzhf/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpbm3nyzhf/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpbm3nyzhf', '-I/usr/include/python3.12']' returned non-zero exit status 4. (EngineCore_DP0 pid=275) Traceback (most recent call last): (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=275) self.run() (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=275) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1104, in run_engine_core (EngineCore_DP0 pid=275) raise e (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=275) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=275) super().init( (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=275) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 252, in _initialize_kv_caches (EngineCore_DP0 pid=275) available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore_DP0 pid=275) return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc (EngineCore_DP0 pid=275) result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 390, in determine_available_memory (EngineCore_DP0 pid=275) self.model_runner.profile_run() (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5296, in profile_run (EngineCore_DP0 pid=275) dummy_encoder_outputs = self.model.embed_multimodal( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1242, in embed_multimodal (EngineCore_DP0 pid=275) image_embeds = self._process_image_input(image_input) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1229, in _process_image_input (EngineCore_DP0 pid=275) vision_outputs = tuple( (EngineCore_DP0 pid=275) ^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1230, in <genexpr> (EngineCore_DP0 pid=275) self.encode_image(pixel, grid).squeeze(0) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1215, in encode_image (EngineCore_DP0 pid=275) vision_outputs = self.visual( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 914, in forward (EngineCore_DP0 pid=275) return self.vision_model( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 866, in forward (EngineCore_DP0 pid=275) last_hidden_state = self.encoder( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 821, in forward (EngineCore_DP0 pid=275) hidden_states = encoder_layer( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 712, in forward (EngineCore_DP0 pid=275) hidden_states = self.self_attn( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 634, in forward (EngineCore_DP0 pid=275) qk_rotated = self.apply_rotary_emb( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 129, in forward (EngineCore_DP0 pid=275) return self._forward_method(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding/common.py", line 244, in forward_cuda (EngineCore_DP0 pid=275) output = apply_rotary_emb(x, cos, sin, interleaved) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb (EngineCore_DP0 pid=275) return ApplyRotaryEmb.apply( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 583, in apply (EngineCore_DP0 pid=275) return super().apply(*args, **kwargs) # type: ignore[misc] (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=275) out = apply_rotary( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=275) rotary_kernel[grid]( (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=275) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=275) device = driver.active.get_current_device() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=275) self._active = self.default (EngineCore_DP0 pid=275) ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=275) self._default = _create_driver() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=275) return active_drivers0 (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=275) self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=275) ^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=275) mod = compile_module_from_src( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=275) so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=275) subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=275) raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=275) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpbm3nyzhf/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpbm3nyzhf/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpbm3nyzhf', '-I/usr/include/python3.12']' returned non-zero exit status 4. [rank0]:[W317 11:11:43.231246627 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=7) Traceback (most recent call last): (APIServer pid=7) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=7) sys.exit(main()) (APIServer pid=7) ^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=7) args.dispatch_function(args) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=7) uvloop.run(run_server(args)) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=7) return __asyncio.run( (APIServer pid=7) ^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=7) return runner.run(main) (APIServer pid=7) ^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=7) return self._loop.run_until_complete(task) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=7) return await main (APIServer pid=7) ^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=7) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=7) async with build_async_engine_client( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=7) return await anext(self.gen) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=7) async with build_async_engine_client_from_engine_args( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=7) return await anext(self.gen) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=7) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=7) return cls( (APIServer pid=7) ^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=7) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=7) return func(*args, **kwargs) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client (APIServer pid=7) return AsyncMPClient(*client_args) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=7) return func(*args, **kwargs) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 911, in init (APIServer pid=7) super().init( (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 569, in init (APIServer pid=7) with launch_core_engines( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=7) next(self.gen) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines (APIServer pid=7) wait_for_engine_startup( (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup (APIServer pid=7) raise RuntimeError( (APIServer pid=7) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [root@4car bin]# docker logs -f --tail 100 maas-table (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=437) out = apply_rotary( (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=437) rotary_kernel[grid]( (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=437) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=437) device = driver.active.get_current_device() (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=437) self._active = self.default (EngineCore_DP0 pid=437) ^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=437) self._default = _create_driver() (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=437) return active_drivers0 (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=437) self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=437) ^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=437) mod = compile_module_from_src( (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=437) so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=437) subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=437) File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=437) raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=437) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpk8vzb95c/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpk8vzb95c/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpk8vzb95c', '-I/usr/include/python3.12']' returned non-zero exit status 4. [rank0]:[W317 11:20:49.325394522 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=6) Traceback (most recent call last): (APIServer pid=6) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=6) sys.exit(main()) (APIServer pid=6) ^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=6) args.dispatch_function(args) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=6) uvloop.run(run_server(args)) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=6) return __asyncio.run( (APIServer pid=6) ^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=6) return runner.run(main) (APIServer pid=6) ^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=6) return self._loop.run_until_complete(task) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=6) return await main (APIServer pid=6) ^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=6) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=6) async with build_async_engine_client( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=6) return await anext(self.gen) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=6) async with build_async_engine_client_from_engine_args( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=6) return await anext(self.gen) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=6) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=6) return cls( (APIServer pid=6) ^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=6) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=6) return func(*args, **kwargs) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client (APIServer pid=6) return AsyncMPClient(*client_args) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=6) return func(*args, **kwargs) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 911, in init (APIServer pid=6) super().init( (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 569, in init (APIServer pid=6) with launch_core_engines( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=6) next(self.gen) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines (APIServer pid=6) wait_for_engine_startup( (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup (APIServer pid=6) raise RuntimeError( (APIServer pid=6) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Fix Action

Fix / Workaround

报错 (APIServer pid=7) INFO 03-17 11:10:14 [utils.py:238] non-default args: {'model_tag': './PaddleOCR-VL', 'host': '0.0.0.0', 'port': 18023, 'model': './PaddleOCR-VL', 'trust_remote_code': True, 'served_model_name': ['PaddleOCR-VL-0.9B'], 'gpu_memory_utilization': 0.3, 'enable_prefix_caching': False, 'mm_processor_cache_gb': 0.0} (APIServer pid=7) WARNING 03-17 11:10:14 [envs.py:1710] Unknown vLLM environment variable detected: VLLM_DISABLE_TRITON (APIServer pid=7) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=7) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=7) INFO 03-17 11:10:14 [model.py:531] Resolved architecture: PaddleOCRVLForConditionalGeneration (APIServer pid=7) INFO 03-17 11:10:14 [model.py:1554] Using max model len 131072 (APIServer pid=7) INFO 03-17 11:10:14 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048. (APIServer pid=7) INFO 03-17 11:10:14 [vllm.py:747] Asynchronous scheduling is enabled. (APIServer pid=7) Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. (EngineCore_DP0 pid=275) INFO 03-17 11:10:25 [core.py:101] Initializing a V1 LLM engine (v0.17.1) with config: model='./PaddleOCR-VL', speculative_config=None, tokenizer='./PaddleOCR-VL', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=PaddleOCR-VL-0.9B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore_DP0 pid=275) INFO 03-17 11:10:26 [parallel_state.py:1393] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.51.51:42717 backend=nccl [W317 11:10:36.233467487 socket.cpp:207] [c10d] The hostname of the client socket cannot be retrieved. err=-3 [rank0]:[W317 11:10:46.245054413 ProcessGroupGloo.cpp:511] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) (EngineCore_DP0 pid=275) INFO 03-17 11:11:36 [parallel_state.py:1715] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A (EngineCore_DP0 pid=275) Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [base.py:106] Offloader set to NoopOffloader (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [gpu_model_runner.py:4281] Starting to load model ./PaddleOCR-VL... (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [cuda.py:453] Using backend AttentionBackendEnum.FLASH_ATTN for vit attention (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [mm_encoder_attention.py:215] Using AttentionBackendEnum.FLASH_ATTN for MMEncoderAttention. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [cuda.py:405] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION']. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [flash_attn.py:587] Using FlashAttention version 2 (EngineCore_DP0 pid=275) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (EngineCore_DP0 pid=275) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.14it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.14it/s] (EngineCore_DP0 pid=275) (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [default_loader.py:293] Loading weights took 0.34 seconds (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [gpu_model_runner.py:4364] Model loading took 1.89 GiB memory and 0.770836 seconds (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [gpu_model_runner.py:5280] Encoder cache will be initialized with a budget of 3600 tokens, and profiled with 1 image items of the maximum feature size. gcc: internal compiler error: Segmentation fault signal terminated program cc1 Please submit a full bug report, with preprocessed source if appropriate. See file:///usr/share/doc/gcc-11/README.Bugs for instructions. (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] EngineCore failed to start. (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] Traceback (most recent call last): (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] super().init( (EngineCore_DP0 pid=275) Process EngineCore_DP0: (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 252, in _initialize_kv_caches (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 390, in determine_available_memory (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.model_runner.profile_run() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5296, in profile_run (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] dummy_encoder_outputs = self.model.embed_multimodal( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1242, in embed_multimodal (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] image_embeds = self._process_image_input(image_input) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1229, in _process_image_input (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] vision_outputs = tuple( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1230, in <genexpr> (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.encode_image(pixel, grid).squeeze(0) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1215, in encode_image (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] vision_outputs = self.visual( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 914, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self.vision_model( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 866, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] last_hidden_state = self.encoder( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 821, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] hidden_states = encoder_layer( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 712, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] hidden_states = self.self_attn( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 634, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] qk_rotated = self.apply_rotary_emb( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 129, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._forward_method(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding/common.py", line 244, in forward_cuda (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] output = apply_rotary_emb(x, cos, sin, interleaved) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return ApplyRotaryEmb.apply( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 583, in apply (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return super().apply(*args, **kwargs) # type: ignore[misc] (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] out = apply_rotary( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] rotary_kernel[grid]( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] device = driver.active.get_current_device() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self._active = self.default (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self._default = _create_driver() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return active_drivers0 (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] mod = compile_module_from_src( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpbm3nyzhf/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpbm3nyzhf/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpbm3nyzhf', '-I/usr/include/python3.12']' returned non-zero exit status 4. (EngineCore_DP0 pid=275) Traceback (most recent call last): (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=275) self.run() (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=275) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1104, in run_engine_core (EngineCore_DP0 pid=275) raise e (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=275) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=275) super().init( (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=275) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 252, in _initialize_kv_caches (EngineCore_DP0 pid=275) available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore_DP0 pid=275) return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc (EngineCore_DP0 pid=275) result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 390, in determine_available_memory (EngineCore_DP0 pid=275) self.model_runner.profile_run() (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5296, in profile_run (EngineCore_DP0 pid=275) dummy_encoder_outputs = self.model.embed_multimodal( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1242, in embed_multimodal (EngineCore_DP0 pid=275) image_embeds = self._process_image_input(image_input) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1229, in _process_image_input (EngineCore_DP0 pid=275) vision_outputs = tuple( (EngineCore_DP0 pid=275) ^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1230, in <genexpr> (EngineCore_DP0 pid=275) self.encode_image(pixel, grid).squeeze(0) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1215, in encode_image (EngineCore_DP0 pid=275) vision_outputs = self.visual( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 914, in forward (EngineCore_DP0 pid=275) return self.vision_model( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 866, in forward (EngineCore_DP0 pid=275) last_hidden_state = self.encoder( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 821, in forward (EngineCore_DP0 pid=275) hidden_states = encoder_layer( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 712, in forward (EngineCore_DP0 pid=275) hidden_states = self.self_attn( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 634, in forward (EngineCore_DP0 pid=275) qk_rotated = self.apply_rotary_emb( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 129, in forward (EngineCore_DP0 pid=275) return self._forward_method(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding/common.py", line 244, in forward_cuda (EngineCore_DP0 pid=275) output = apply_rotary_emb(x, cos, sin, interleaved) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb (EngineCore_DP0 pid=275) return ApplyRotaryEmb.apply( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 583, in apply (EngineCore_DP0 pid=275) return super().apply(*args, **kwargs) # type: ignore[misc] (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=275) out = apply_rotary( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=275) rotary_kernel[grid]( (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=275) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=275) device = driver.active.get_current_device() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=275) self._active = self.default (EngineCore_DP0 pid=275) ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=275) self._default = _create_driver() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=275) return active_drivers0 (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=275) self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=275) ^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=275) mod = compile_module_from_src( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=275) so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=275) subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=275) raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=275) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpbm3nyzhf/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpbm3nyzhf/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpbm3nyzhf', '-I/usr/include/python3.12']' returned non-zero exit status 4. [rank0]:[W317 11:11:43.231246627 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=7) Traceback (most recent call last): (APIServer pid=7) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=7) sys.exit(main()) (APIServer pid=7) ^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=7) args.dispatch_function(args) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=7) uvloop.run(run_server(args)) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=7) return __asyncio.run( (APIServer pid=7) ^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=7) return runner.run(main) (APIServer pid=7) ^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=7) return self._loop.run_until_complete(task) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=7) return await main (APIServer pid=7) ^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=7) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=7) async with build_async_engine_client( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=7) return await anext(self.gen) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=7) async with build_async_engine_client_from_engine_args( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=7) return await anext(self.gen) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=7) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=7) return cls( (APIServer pid=7) ^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=7) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=7) return func(*args, **kwargs) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client (APIServer pid=7) return AsyncMPClient(*client_args) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=7) return func(*args, **kwargs) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 911, in init (APIServer pid=7) super().init( (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 569, in init (APIServer pid=7) with launch_core_engines( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=7) next(self.gen) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines (APIServer pid=7) wait_for_engine_startup( (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup (APIServer pid=7) raise RuntimeError( (APIServer pid=7) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [root@4car bin]# docker logs -f --tail 100 maas-table (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=437) out = apply_rotary( (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=437) rotary_kernel[grid]( (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=437) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=437) device = driver.active.get_current_device() (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=437) self._active = self.default (EngineCore_DP0 pid=437) ^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=437) self._default = _create_driver() (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=437) return active_drivers0 (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=437) self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=437) ^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=437) mod = compile_module_from_src( (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=437) so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=437) subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=437) File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=437) raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=437) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpk8vzb95c/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpk8vzb95c/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpk8vzb95c', '-I/usr/include/python3.12']' returned non-zero exit status 4. [rank0]:[W317 11:20:49.325394522 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=6) Traceback (most recent call last): (APIServer pid=6) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=6) sys.exit(main()) (APIServer pid=6) ^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=6) args.dispatch_function(args) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=6) uvloop.run(run_server(args)) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=6) return __asyncio.run( (APIServer pid=6) ^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=6) return runner.run(main) (APIServer pid=6) ^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=6) return self._loop.run_until_complete(task) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=6) return await main (APIServer pid=6) ^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=6) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=6) async with build_async_engine_client( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=6) return await anext(self.gen) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=6) async with build_async_engine_client_from_engine_args( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=6) return await anext(self.gen) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=6) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=6) return cls( (APIServer pid=6) ^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=6) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=6) return func(*args, **kwargs) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client (APIServer pid=6) return AsyncMPClient(*client_args) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=6) return func(*args, **kwargs) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 911, in init (APIServer pid=6) super().init( (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 569, in init (APIServer pid=6) with launch_core_engines( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=6) next(self.gen) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines (APIServer pid=6) wait_for_engine_startup( (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup (APIServer pid=6) raise RuntimeError( (APIServer pid=6) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

RAW_BUFFERClick to expand / collapse

Your current environment

rtx 3090 * 2 cuda version:12.9 Driver Version 575.51.03 CentOS Linux 8 docker image: vllm/vllm-openai:v0.17.1

🐛 Describe the bug

docker run --name $SERVICE_NAME -d --gpus all --shm-size 30.24gb --network host
-v $WORKSPACE_PATH:/vllm-workspace vllm/vllm-openai:v0.17.1 以上脚本启动容器时,容器中的启动脚本为 #!/bin/bash export CUDA_VISIBLE_DEVICES=0 vllm serve ./PaddleOCR-VL --config llm_config.yaml

报错 (APIServer pid=7) INFO 03-17 11:10:14 [utils.py:238] non-default args: {'model_tag': './PaddleOCR-VL', 'host': '0.0.0.0', 'port': 18023, 'model': './PaddleOCR-VL', 'trust_remote_code': True, 'served_model_name': ['PaddleOCR-VL-0.9B'], 'gpu_memory_utilization': 0.3, 'enable_prefix_caching': False, 'mm_processor_cache_gb': 0.0} (APIServer pid=7) WARNING 03-17 11:10:14 [envs.py:1710] Unknown vLLM environment variable detected: VLLM_DISABLE_TRITON (APIServer pid=7) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=7) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=7) INFO 03-17 11:10:14 [model.py:531] Resolved architecture: PaddleOCRVLForConditionalGeneration (APIServer pid=7) INFO 03-17 11:10:14 [model.py:1554] Using max model len 131072 (APIServer pid=7) INFO 03-17 11:10:14 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048. (APIServer pid=7) INFO 03-17 11:10:14 [vllm.py:747] Asynchronous scheduling is enabled. (APIServer pid=7) Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. (EngineCore_DP0 pid=275) INFO 03-17 11:10:25 [core.py:101] Initializing a V1 LLM engine (v0.17.1) with config: model='./PaddleOCR-VL', speculative_config=None, tokenizer='./PaddleOCR-VL', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=PaddleOCR-VL-0.9B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore_DP0 pid=275) INFO 03-17 11:10:26 [parallel_state.py:1393] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.51.51:42717 backend=nccl [W317 11:10:36.233467487 socket.cpp:207] [c10d] The hostname of the client socket cannot be retrieved. err=-3 [rank0]:[W317 11:10:46.245054413 ProcessGroupGloo.cpp:511] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) (EngineCore_DP0 pid=275) INFO 03-17 11:11:36 [parallel_state.py:1715] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A (EngineCore_DP0 pid=275) Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [base.py:106] Offloader set to NoopOffloader (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [gpu_model_runner.py:4281] Starting to load model ./PaddleOCR-VL... (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [cuda.py:453] Using backend AttentionBackendEnum.FLASH_ATTN for vit attention (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [mm_encoder_attention.py:215] Using AttentionBackendEnum.FLASH_ATTN for MMEncoderAttention. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [cuda.py:405] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION']. (EngineCore_DP0 pid=275) INFO 03-17 11:11:41 [flash_attn.py:587] Using FlashAttention version 2 (EngineCore_DP0 pid=275) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (EngineCore_DP0 pid=275) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.14it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.14it/s] (EngineCore_DP0 pid=275) (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [default_loader.py:293] Loading weights took 0.34 seconds (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [gpu_model_runner.py:4364] Model loading took 1.89 GiB memory and 0.770836 seconds (EngineCore_DP0 pid=275) INFO 03-17 11:11:42 [gpu_model_runner.py:5280] Encoder cache will be initialized with a budget of 3600 tokens, and profiled with 1 image items of the maximum feature size. gcc: internal compiler error: Segmentation fault signal terminated program cc1 Please submit a full bug report, with preprocessed source if appropriate. See file:///usr/share/doc/gcc-11/README.Bugs for instructions. (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] EngineCore failed to start. (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] Traceback (most recent call last): (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] super().init( (EngineCore_DP0 pid=275) Process EngineCore_DP0: (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 252, in _initialize_kv_caches (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 390, in determine_available_memory (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.model_runner.profile_run() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5296, in profile_run (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] dummy_encoder_outputs = self.model.embed_multimodal( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1242, in embed_multimodal (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] image_embeds = self._process_image_input(image_input) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1229, in _process_image_input (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] vision_outputs = tuple( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1230, in <genexpr> (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.encode_image(pixel, grid).squeeze(0) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1215, in encode_image (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] vision_outputs = self.visual( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 914, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self.vision_model( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 866, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] last_hidden_state = self.encoder( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 821, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] hidden_states = encoder_layer( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 712, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] hidden_states = self.self_attn( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 634, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] qk_rotated = self.apply_rotary_emb( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 129, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return self._forward_method(*args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding/common.py", line 244, in forward_cuda (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] output = apply_rotary_emb(x, cos, sin, interleaved) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return ApplyRotaryEmb.apply( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 583, in apply (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return super().apply(*args, **kwargs) # type: ignore[misc] (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] out = apply_rotary( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] rotary_kernel[grid]( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] device = driver.active.get_current_device() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self._active = self.default (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self._default = _create_driver() (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] return active_drivers0 (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] mod = compile_module_from_src( (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=275) ERROR 03-17 11:11:43 [core.py:1100] subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpbm3nyzhf/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpbm3nyzhf/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpbm3nyzhf', '-I/usr/include/python3.12']' returned non-zero exit status 4. (EngineCore_DP0 pid=275) Traceback (most recent call last): (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=275) self.run() (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=275) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1104, in run_engine_core (EngineCore_DP0 pid=275) raise e (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=275) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=275) super().init( (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=275) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 252, in _initialize_kv_caches (EngineCore_DP0 pid=275) available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore_DP0 pid=275) return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 76, in collective_rpc (EngineCore_DP0 pid=275) result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=275) return func(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 390, in determine_available_memory (EngineCore_DP0 pid=275) self.model_runner.profile_run() (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5296, in profile_run (EngineCore_DP0 pid=275) dummy_encoder_outputs = self.model.embed_multimodal( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1242, in embed_multimodal (EngineCore_DP0 pid=275) image_embeds = self._process_image_input(image_input) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1229, in _process_image_input (EngineCore_DP0 pid=275) vision_outputs = tuple( (EngineCore_DP0 pid=275) ^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1230, in <genexpr> (EngineCore_DP0 pid=275) self.encode_image(pixel, grid).squeeze(0) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 1215, in encode_image (EngineCore_DP0 pid=275) vision_outputs = self.visual( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 914, in forward (EngineCore_DP0 pid=275) return self.vision_model( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 866, in forward (EngineCore_DP0 pid=275) last_hidden_state = self.encoder( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 821, in forward (EngineCore_DP0 pid=275) hidden_states = encoder_layer( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 712, in forward (EngineCore_DP0 pid=275) hidden_states = self.self_attn( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/paddleocr_vl.py", line 634, in forward (EngineCore_DP0 pid=275) qk_rotated = self.apply_rotary_emb( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=275) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=275) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 129, in forward (EngineCore_DP0 pid=275) return self._forward_method(*args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding/common.py", line 244, in forward_cuda (EngineCore_DP0 pid=275) output = apply_rotary_emb(x, cos, sin, interleaved) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb (EngineCore_DP0 pid=275) return ApplyRotaryEmb.apply( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 583, in apply (EngineCore_DP0 pid=275) return super().apply(*args, **kwargs) # type: ignore[misc] (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=275) out = apply_rotary( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=275) rotary_kernel[grid]( (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=275) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=275) device = driver.active.get_current_device() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=275) self._active = self.default (EngineCore_DP0 pid=275) ^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=275) self._default = _create_driver() (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=275) return active_drivers0 (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=275) self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=275) ^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=275) mod = compile_module_from_src( (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=275) so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=275) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=275) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=275) subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=275) File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=275) raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=275) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpbm3nyzhf/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpbm3nyzhf/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpbm3nyzhf', '-I/usr/include/python3.12']' returned non-zero exit status 4. [rank0]:[W317 11:11:43.231246627 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=7) Traceback (most recent call last): (APIServer pid=7) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=7) sys.exit(main()) (APIServer pid=7) ^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=7) args.dispatch_function(args) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=7) uvloop.run(run_server(args)) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=7) return __asyncio.run( (APIServer pid=7) ^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=7) return runner.run(main) (APIServer pid=7) ^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=7) return self._loop.run_until_complete(task) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=7) return await main (APIServer pid=7) ^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=7) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=7) async with build_async_engine_client( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=7) return await anext(self.gen) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=7) async with build_async_engine_client_from_engine_args( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=7) return await anext(self.gen) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=7) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=7) return cls( (APIServer pid=7) ^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=7) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=7) return func(*args, **kwargs) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client (APIServer pid=7) return AsyncMPClient(*client_args) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=7) return func(*args, **kwargs) (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 911, in init (APIServer pid=7) super().init( (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 569, in init (APIServer pid=7) with launch_core_engines( (APIServer pid=7) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=7) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=7) next(self.gen) (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines (APIServer pid=7) wait_for_engine_startup( (APIServer pid=7) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup (APIServer pid=7) raise RuntimeError( (APIServer pid=7) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [root@4car bin]# docker logs -f --tail 100 maas-table (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward (EngineCore_DP0 pid=437) out = apply_rotary( (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary (EngineCore_DP0 pid=437) rotary_kernel[grid]( (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 370, in <lambda> (EngineCore_DP0 pid=437) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 700, in run (EngineCore_DP0 pid=437) device = driver.active.get_current_device() (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 28, in active (EngineCore_DP0 pid=437) self._active = self.default (EngineCore_DP0 pid=437) ^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 22, in default (EngineCore_DP0 pid=437) self._default = _create_driver() (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 10, in _create_driver (EngineCore_DP0 pid=437) return active_drivers0 (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 720, in init (EngineCore_DP0 pid=437) self.utils = CudaUtils() # TODO: make static (EngineCore_DP0 pid=437) ^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 62, in init (EngineCore_DP0 pid=437) mod = compile_module_from_src( (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 93, in compile_module_from_src (EngineCore_DP0 pid=437) so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [], ccflags or []) (EngineCore_DP0 pid=437) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=437) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/build.py", line 48, in _build (EngineCore_DP0 pid=437) subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL) (EngineCore_DP0 pid=437) File "/usr/lib/python3.12/subprocess.py", line 413, in check_call (EngineCore_DP0 pid=437) raise CalledProcessError(retcode, cmd) (EngineCore_DP0 pid=437) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpk8vzb95c/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpk8vzb95c/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-l:libcuda.so.1', '-L/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/include', '-I/tmp/tmpk8vzb95c', '-I/usr/include/python3.12']' returned non-zero exit status 4. [rank0]:[W317 11:20:49.325394522 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=6) Traceback (most recent call last): (APIServer pid=6) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=6) sys.exit(main()) (APIServer pid=6) ^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=6) args.dispatch_function(args) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=6) uvloop.run(run_server(args)) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=6) return __asyncio.run( (APIServer pid=6) ^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=6) return runner.run(main) (APIServer pid=6) ^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=6) return self._loop.run_until_complete(task) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=6) return await main (APIServer pid=6) ^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=6) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=6) async with build_async_engine_client( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=6) return await anext(self.gen) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=6) async with build_async_engine_client_from_engine_args( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=6) return await anext(self.gen) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=6) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=6) return cls( (APIServer pid=6) ^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=6) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=6) return func(*args, **kwargs) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client (APIServer pid=6) return AsyncMPClient(*client_args) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=6) return func(*args, **kwargs) (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 911, in init (APIServer pid=6) super().init( (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 569, in init (APIServer pid=6) with launch_core_engines( (APIServer pid=6) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=6) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=6) next(self.gen) (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines (APIServer pid=6) wait_for_engine_startup( (APIServer pid=6) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup (APIServer pid=6) raise RuntimeError( (APIServer pid=6) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The error message indicates a problem with compiling a CUDA utility using GCC. To fix this issue, you can try the following steps:

  • Update GCC: Ensure that your GCC version is up-to-date, as older versions may have compatibility issues with CUDA.
  • Install CUDA Toolkit: Verify that the CUDA Toolkit is properly installed and configured on your system.
  • Check Library Paths: Confirm that the library paths for CUDA and other dependencies are correctly set.
  • Disable JIT Compilation: Try disabling JIT compilation in Triton to see if it resolves the issue.

Here's an example of how you can disable JIT compilation in Triton:

import triton

# Disable JIT compilation
triton.runtime.jit.disable()

Alternatively, you can try setting the TRITON_JIT_DISABLED environment variable to 1 before running your application:

export TRITON_JIT_DISABLED=1

If none of these steps resolve the issue, you may need to provide more details about your environment and configuration for further assistance.

Verification

To verify that the fix worked, you can try running your application again and check for any error messages related to GCC or CUDA compilation. If the issue persists, you can try debugging the compilation process or seeking further assistance from the community or developers.

Extra Tips

  • Make sure to check the documentation for any specific requirements or recommendations for your system configuration and software versions.
  • If you're using a Docker container, ensure that the container has the necessary dependencies and configurations to support CUDA and GCC compilation.
  • Consider seeking help from the community or developers if you're unable to resolve the issue on your own, as they may be able to provide more specific guidance or patches for known issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: gcc: internal compiler error: Segmentation fault signal terminated program cc1 [1 participants]