vllm - ✅(Solved) Fix [Bug]: AMD's minimax mxfp4 trust_remote_code bug [1 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38307Fetched 2026-04-08 01:36:38
View on GitHub
Comments
5
Participants
4
Timeline
28
Reactions
0
Assignees
Timeline (top)
mentioned ×7subscribed ×7commented ×5project_v2_item_status_changed ×3

Error Message

  • vllm serve amd/MiniMax-M2.5-MXFP4 --port 8888 --tensor-parallel-size=2 --gpu-memory-utilization 0.95 --max-model-len 2248 --block-size=32 --trust-remote-code WARNING 03-20 02:24:48 [gpt_oss_triton_kernels_moe.py:56] Using legacy triton_kernels on ROCm /usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/protocol.py:346: SyntaxWarning: invalid escape sequence '\e' "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature " /usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/completion/protocol.py:176: SyntaxWarning: invalid escape sequence '\e' "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature " (APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] (APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] █ █ █▄ ▄█ (APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.17.1 (APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] █▄█▀ █ █ █ █ model amd/MiniMax-M2.5-MXFP4 (APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] (APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:238] non-default args: {'model_tag': 'amd/MiniMax-M2.5-MXFP4', 'port': 8888, 'model': 'amd/MiniMax-M2.5-MXFP4', 'trust_remote_code': True, 'max_model_len': 2248, 'tensor_parallel_size': 2, 'block_size': 32, 'gpu_memory_utilization': 0.95} (APIServer pid=1169773) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=1169773) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:531] Resolved architecture: MiniMaxM2ForCausalLM (APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:1554] Using max model len 2248 (APIServer pid=1169773) [aiter] start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum (APIServer pid=1169773) [2026-03-20 02:24:56] INFO core.py:549: start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum (APIServer pid=1169773) [aiter] finish build [module_aiter_enum], cost 7.8s (APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:699: finish build [module_aiter_enum], cost 7.8s (APIServer pid=1169773) [aiter] import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so (APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:501: import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so (APIServer pid=1169773) INFO 03-20 02:25:04 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=8192. (APIServer pid=1169773) Traceback (most recent call last): (APIServer pid=1169773) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=1169773) sys.exit(main()) (APIServer pid=1169773) ^^^^^^ (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=1169773) args.dispatch_function(args) (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=1169773) uvloop.run(run_server(args)) (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=1169773) return __asyncio.run( (APIServer pid=1169773) ^^^^^^^^^^^^^^ (APIServer pid=1169773) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=1169773) return runner.run(main) (APIServer pid=1169773) ^^^^^^^^^^^^^^^^ (APIServer pid=1169773) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=1169773) return self._loop.run_until_complete(task) (APIServer pid=1169773) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1169773) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=1169773) return await main (APIServer pid=1169773) ^^^^^^^^^^ (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=1169773) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=1169773) async with build_async_engine_client( (APIServer pid=1169773) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1169773) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1169773) return await anext(self.gen) (APIServer pid=1169773) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=1169773) async with build_async_engine_client_from_engine_args( (APIServer pid=1169773) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1169773) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1169773) return await anext(self.gen) (APIServer pid=1169773) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args (APIServer pid=1169773) vllm_config = engine_args.create_engine_config(usage_context=usage_context) (APIServer pid=1169773) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1890, in create_engine_config (APIServer pid=1169773) config = VllmConfig( (APIServer pid=1169773) ^^^^^^^^^^^ (APIServer pid=1169773) File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in init (APIServer pid=1169773) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s) (APIServer pid=1169773) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig (APIServer pid=1169773) Value error, The repository amd/MiniMax-M2.5-MXFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4 . (APIServer pid=1169773) You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4. (APIServer pid=1169773) Please pass the argument trust_remote_code=True to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model_co...transfer_config': None}), input_type=ArgsKwargs] (APIServer pid=1169773) For further information visit https://errors.pydantic.dev/2.12/v/value_error

Fix Action

Fix / Workaround

+ vllm serve amd/MiniMax-M2.5-MXFP4 --port 8888 --tensor-parallel-size=2 --gpu-memory-utilization 0.95 --max-model-len 2248 --block-size=32 --trust-remote-code
WARNING 03-20 02:24:48 [gpt_oss_triton_kernels_moe.py:56] Using legacy triton_kernels on ROCm
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/protocol.py:346: SyntaxWarning: invalid escape sequence '\e'
  "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature "
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/completion/protocol.py:176: SyntaxWarning: invalid escape sequence '\e'
  "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature "
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] 
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]        █     █     █▄   ▄█
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.17.1
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]   █▄█▀ █     █     █     █  model   amd/MiniMax-M2.5-MXFP4
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] 
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:238] non-default args: {'model_tag': 'amd/MiniMax-M2.5-MXFP4', 'port': 8888, 'model': 'amd/MiniMax-M2.5-MXFP4', 'trust_remote_code': True, 'max_model_len': 2248, 'tensor_parallel_size': 2, 'block_size': 32, 'gpu_memory_utilization': 0.95}
(APIServer pid=1169773) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:531] Resolved architecture: MiniMaxM2ForCausalLM
(APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:1554] Using max model len 2248
(APIServer pid=1169773) [aiter] start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
(APIServer pid=1169773) [2026-03-20 02:24:56] INFO core.py:549: start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
(APIServer pid=1169773) [aiter] finish build [module_aiter_enum], cost 7.8s 
(APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:699: finish build [module_aiter_enum], cost 7.8s 
(APIServer pid=1169773) [aiter] import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:501: import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1169773) INFO 03-20 02:25:04 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=1169773) Traceback (most recent call last):
(APIServer pid=1169773)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1169773)     sys.exit(main())
(APIServer pid=1169773)              ^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1169773)     args.dispatch_function(args)
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=1169773)     uvloop.run(run_server(args))
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1169773)     return __asyncio.run(
(APIServer pid=1169773)            ^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1169773)     return runner.run(main)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1169773)     return self._loop.run_until_complete(task)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1169773)     return await main
(APIServer pid=1169773)            ^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=1169773)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=1169773)     async with build_async_engine_client(
(APIServer pid=1169773)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1169773)     return await anext(self.gen)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=1169773)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1169773)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1169773)     return await anext(self.gen)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args
(APIServer pid=1169773)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=1169773)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1890, in create_engine_config
(APIServer pid=1169773)     config = VllmConfig(
(APIServer pid=1169773)              ^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=1169773)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=1169773) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=1169773)   Value error, The repository amd/MiniMax-M2.5-MXFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4 .
(APIServer pid=1169773)  You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4.
(APIServer pid=1169773) Please pass the argument `trust_remote_code=True` to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model_co...transfer_config': None}), input_type=ArgsKwargs]
(APIServer pid=1169773)     For further information visit https://errors.pydantic.dev/2.12/v/value_error

PR fix notes

PR #37698: [ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4

Description (problem / solution / changelog)

Purpose

Fix: https://github.com/vllm-project/vllm/issues/38307

Bug Fix: QuarkConfig.maybe_update_config

Problem: The original code called get_config() with hardcoded trust_remote_code=False for every Quark model. This caused:

  1. Exceptions for models like amd/MiniMax-M2.1-MXFP4 that require trust_remote_code=True For example:
Value error, The repository amd/MiniMax-M2.5-MXFP4 contains custom code which must be executed to correctly load the model.
  1. Wasteful HF hub access for non-deepseek amd quark models where the logic doesn't even apply
  2. the user can not override the trust_remote_code as it is hard-coded

File Changes

vllm/model_executor/layers/quantization/quark/quark.py:

Replaced get_config() call with pre-loaded hf_config from ModelConfig, so no need to get from hf config. Also, user should be able to override trust_remote_code from command line.

Added early return for non-deepseek_v3 model types via _DEEPSEEK_V3_FAMILY_MODEL_TYPES frozenset.

vllm/model_executor/layers/quantization/base_config.py: Extended base maybe_update_config signature to accept revision + **kwargs

vllm.py: Passes hf_config, revision, and trust_remote_code from ModelConfig to maybe_update_config

This will allow user to specify trust_remote_code.

and other places to align with the signature change.

Added new Test

tests/quantization/test_quark_maybe_update_config.py: 3 tests using real HF configs — verifies amd/MiniMax-M2.1-MXFP4 stays False, amd/DeepSeek-R1-MXFP4-ASQ enables True, and missing hf_config doesn't crash

Test Result

root@node:/home/vllm/tests/quantization# pytest test_quark_maybe_update_config.py ==================================================== test session starts ==================================================== platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 rootdir: /dockerx/vllm configfile: pyproject.toml plugins: asyncio-1.3.0, anyio-4.12.1 asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 3 items

test_quark_maybe_update_config.py ... [100%]

=============================================== 3 passed, 2 warnings in 4.72s =============================================== sys:1: DeprecationWarning: builtin type swigvarlink has no module attribute


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/quantization/test_quark_maybe_update_config.py (added, +63/-0)
  • vllm/config/vllm.py (modified, +4/-1)
  • vllm/model_executor/layers/quantization/awq.py (modified, +7/-1)
  • vllm/model_executor/layers/quantization/awq_marlin.py (modified, +7/-1)
  • vllm/model_executor/layers/quantization/base_config.py (modified, +15/-1)
  • vllm/model_executor/layers/quantization/cpu_wna16.py (modified, +7/-1)
  • vllm/model_executor/layers/quantization/gptq.py (modified, +7/-1)
  • vllm/model_executor/layers/quantization/gptq_marlin.py (modified, +7/-1)
  • vllm/model_executor/layers/quantization/quark/quark.py (modified, +25/-12)

Code Example

+ vllm serve amd/MiniMax-M2.5-MXFP4 --port 8888 --tensor-parallel-size=2 --gpu-memory-utilization 0.95 --max-model-len 2248 --block-size=32 --trust-remote-code
WARNING 03-20 02:24:48 [gpt_oss_triton_kernels_moe.py:56] Using legacy triton_kernels on ROCm
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/protocol.py:346: SyntaxWarning: invalid escape sequence '\e'
  "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature "
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/completion/protocol.py:176: SyntaxWarning: invalid escape sequence '\e'
  "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature "
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] 
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]        █     █     █▄   ▄█
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.17.1
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]   █▄█▀ █     █     █     █  model   amd/MiniMax-M2.5-MXFP4
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] 
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:238] non-default args: {'model_tag': 'amd/MiniMax-M2.5-MXFP4', 'port': 8888, 'model': 'amd/MiniMax-M2.5-MXFP4', 'trust_remote_code': True, 'max_model_len': 2248, 'tensor_parallel_size': 2, 'block_size': 32, 'gpu_memory_utilization': 0.95}
(APIServer pid=1169773) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:531] Resolved architecture: MiniMaxM2ForCausalLM
(APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:1554] Using max model len 2248
(APIServer pid=1169773) [aiter] start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
(APIServer pid=1169773) [2026-03-20 02:24:56] INFO core.py:549: start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
(APIServer pid=1169773) [aiter] finish build [module_aiter_enum], cost 7.8s 
(APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:699: finish build [module_aiter_enum], cost 7.8s 
(APIServer pid=1169773) [aiter] import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:501: import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1169773) INFO 03-20 02:25:04 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=1169773) Traceback (most recent call last):
(APIServer pid=1169773)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1169773)     sys.exit(main())
(APIServer pid=1169773)              ^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1169773)     args.dispatch_function(args)
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=1169773)     uvloop.run(run_server(args))
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1169773)     return __asyncio.run(
(APIServer pid=1169773)            ^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1169773)     return runner.run(main)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1169773)     return self._loop.run_until_complete(task)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1169773)     return await main
(APIServer pid=1169773)            ^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=1169773)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=1169773)     async with build_async_engine_client(
(APIServer pid=1169773)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1169773)     return await anext(self.gen)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=1169773)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1169773)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1169773)     return await anext(self.gen)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args
(APIServer pid=1169773)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=1169773)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1890, in create_engine_config
(APIServer pid=1169773)     config = VllmConfig(
(APIServer pid=1169773)              ^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=1169773)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=1169773) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=1169773)   Value error, The repository amd/MiniMax-M2.5-MXFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4 .
(APIServer pid=1169773)  You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4.
(APIServer pid=1169773) Please pass the argument `trust_remote_code=True` to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model_co...transfer_config': None}), input_type=ArgsKwargs]
(APIServer pid=1169773)     For further information visit https://errors.pydantic.dev/2.12/v/value_error
RAW_BUFFERClick to expand / collapse

Your current environment

image: vllm/vllm-openai-rocm:v0.17.1

🐛 Describe the bug

already filed via slack last friday but want to file here to track it.

blocker for merging this PR in https://github.com/SemiAnalysisAI/InferenceX/pull/827

even when doing trust_remote_code=true, minimax mxfp4 doesnt use it leading to this bug.

seems like @hongxiayang already working on fixing it https://github.com/vllm-project/vllm/pull/37698

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23326389246/job/67848378566?pr=827

+ vllm serve amd/MiniMax-M2.5-MXFP4 --port 8888 --tensor-parallel-size=2 --gpu-memory-utilization 0.95 --max-model-len 2248 --block-size=32 --trust-remote-code
WARNING 03-20 02:24:48 [gpt_oss_triton_kernels_moe.py:56] Using legacy triton_kernels on ROCm
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/protocol.py:346: SyntaxWarning: invalid escape sequence '\e'
  "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature "
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/completion/protocol.py:176: SyntaxWarning: invalid escape sequence '\e'
  "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature "
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] 
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]        █     █     █▄   ▄█
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.17.1
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]   █▄█▀ █     █     █     █  model   amd/MiniMax-M2.5-MXFP4
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] 
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:238] non-default args: {'model_tag': 'amd/MiniMax-M2.5-MXFP4', 'port': 8888, 'model': 'amd/MiniMax-M2.5-MXFP4', 'trust_remote_code': True, 'max_model_len': 2248, 'tensor_parallel_size': 2, 'block_size': 32, 'gpu_memory_utilization': 0.95}
(APIServer pid=1169773) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:531] Resolved architecture: MiniMaxM2ForCausalLM
(APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:1554] Using max model len 2248
(APIServer pid=1169773) [aiter] start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
(APIServer pid=1169773) [2026-03-20 02:24:56] INFO core.py:549: start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
(APIServer pid=1169773) [aiter] finish build [module_aiter_enum], cost 7.8s 
(APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:699: finish build [module_aiter_enum], cost 7.8s 
(APIServer pid=1169773) [aiter] import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:501: import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1169773) INFO 03-20 02:25:04 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=1169773) Traceback (most recent call last):
(APIServer pid=1169773)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1169773)     sys.exit(main())
(APIServer pid=1169773)              ^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1169773)     args.dispatch_function(args)
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=1169773)     uvloop.run(run_server(args))
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1169773)     return __asyncio.run(
(APIServer pid=1169773)            ^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1169773)     return runner.run(main)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1169773)     return self._loop.run_until_complete(task)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1169773)     return await main
(APIServer pid=1169773)            ^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=1169773)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=1169773)     async with build_async_engine_client(
(APIServer pid=1169773)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1169773)     return await anext(self.gen)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=1169773)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1169773)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1169773)     return await anext(self.gen)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args
(APIServer pid=1169773)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=1169773)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1890, in create_engine_config
(APIServer pid=1169773)     config = VllmConfig(
(APIServer pid=1169773)              ^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=1169773)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=1169773) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=1169773)   Value error, The repository amd/MiniMax-M2.5-MXFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4 .
(APIServer pid=1169773)  You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4.
(APIServer pid=1169773) Please pass the argument `trust_remote_code=True` to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model_co...transfer_config': None}), input_type=ArgsKwargs]
(APIServer pid=1169773)     For further information visit https://errors.pydantic.dev/2.12/v/value_error

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The issue arises from the model amd/MiniMax-M2.5-MXFP4 containing custom code that needs to be executed to load the model correctly. The error message suggests passing the argument trust_remote_code=True to allow custom code to run. However, the provided command already includes this argument, but it's being ignored because it's not applicable in this context.

To fix the issue, you need to modify the code to handle the custom code execution for the model. Here are the steps:

  • Modify the create_engine_config function in engine/arg_utils.py to handle the trust_remote_code argument correctly.
  • Add a check to see if the model contains custom code and if trust_remote_code is True. If both conditions are met, execute the custom code.

Example code:

# engine/arg_utils.py
def create_engine_config(self, usage_context):
    # ... existing code ...
    if self.model.contains_custom_code and self.trust_remote_code:
        # Execute custom code
        self.model.execute_custom_code()
    # ... existing code ...

Verification

To verify that the fix worked, run the vllm serve command with the trust_remote_code=True argument and check if the model loads correctly.

vllm serve amd/MiniMax-M2.5-MXFP4 --port 8888 --tensor-parallel-size=2 --gpu-memory-utilization

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: AMD's minimax mxfp4 trust_remote_code bug [1 pull requests, 5 comments, 4 participants]