vllm - 💡(How to fix) Fix [Bug]: input_audio content with uuid is parsed incorrectly [2 pull requests]

Q: Expected behavior

vLLM should pass `part["input_audio"]` to the downstream audio parser and should not fail with: ```text Expected code to be unreachable, but got: None ``` If the audio payload itself is invalid or unsupported, the request should continue into the normal audio loading path and return the appropriate audio validation error.

Code Example

Collecting environment information...

  System Info
  OS                           : Ubuntu 22.04.5 LTS (x86_64)
  GCC version                  : 11.4.0
  Libc version                 : glibc-2.35

  PyTorch Info
  PyTorch version              : 2.11.0+cu130
  CUDA used to build PyTorch   : 13.0

  Python Environment
  Python version               : 3.12.13
  Python platform              : Linux-6.8.0-107-generic-x86_64-with-glibc2.35

  CUDA / GPU Info
  Is CUDA available            : True
  GPU models and configuration :
  GPU 0-7                      : NVIDIA A100-SXM4-80GB
  Nvidia driver version        : 580.126.20

  CPU Info
  CPU(s)                       : 176
  Model name                   : Intel(R) Xeon(R) Platinum 8458P
  Socket(s)                    : 2
  Core(s) per socket           : 44
  Thread(s) per core           : 2

  Versions of relevant libraries
  flashinfer-python            : 0.6.8.post1
  numpy                        : 2.2.6
  torch                        : 2.11.0+cu130
  torchaudio                   : 2.11.0+cu130
  torchvision                  : 0.26.0+cu130
  transformers                 : 5.8.1
  triton                       : 3.6.0

  vLLM Info
  vLLM Version                 : 0.21.0
  CUDA Archs                   : 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX
  ROCm                         : Disabled
  XPU                          : Disabled

  Environment Variables
  NVIDIA_VISIBLE_DEVICES       : all
  CUDA_VERSION                 : 13.0.2
  VLLM_USAGE_SOURCE            : production-docker-image
  VLLM_ENABLE_CUDA_COMPATIBILITY: 0
  LD_LIBRARY_PATH              : /usr/local/nvidia/lib64:/usr/local/cuda/lib64:...
  TORCHINDUCTOR_CACHE_DIR      : /tmp/torchinductor_root

---

AssertionError: Expected code to be unreachable, but got: None

---

input_audio_params = cast(dict[str, str], part)

---

input_audio_params = cast(InputAudio, part["input_audio"])

---

{
  "model": "gemma-4-E2B-it",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Please describe this audio in one sentence."
        },
        {
          "type": "input_audio",
          "input_audio": {
            "data": "<base64_audio>",
            "format": "wav"
          },
          "uuid": "audio-smoke-uuid-001"
        }
      ]
    }
  ],
  "max_tokens": 16,
  "temperature": 0
}

---

{
  "type": "input_audio",
  "input_audio": {
    "data": "<base64_audio>",
    "format": "wav"
  },
  "uuid": "audio-smoke-uuid-001"
}

---

Expected code to be unreachable, but got: None

---

{
  "message": "Expected code to be unreachable, but got: None",
  "type": "InternalServerError",
  "param": null,
  "code": 500
}

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

Collecting environment information...

  System Info
  OS                           : Ubuntu 22.04.5 LTS (x86_64)
  GCC version                  : 11.4.0
  Libc version                 : glibc-2.35

  PyTorch Info
  PyTorch version              : 2.11.0+cu130
  CUDA used to build PyTorch   : 13.0

  Python Environment
  Python version               : 3.12.13
  Python platform              : Linux-6.8.0-107-generic-x86_64-with-glibc2.35

  CUDA / GPU Info
  Is CUDA available            : True
  GPU models and configuration :
  GPU 0-7                      : NVIDIA A100-SXM4-80GB
  Nvidia driver version        : 580.126.20

  CPU Info
  CPU(s)                       : 176
  Model name                   : Intel(R) Xeon(R) Platinum 8458P
  Socket(s)                    : 2
  Core(s) per socket           : 44
  Thread(s) per core           : 2

  Versions of relevant libraries
  flashinfer-python            : 0.6.8.post1
  numpy                        : 2.2.6
  torch                        : 2.11.0+cu130
  torchaudio                   : 2.11.0+cu130
  torchvision                  : 0.26.0+cu130
  transformers                 : 5.8.1
  triton                       : 3.6.0

  vLLM Info
  vLLM Version                 : 0.21.0
  CUDA Archs                   : 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX
  ROCm                         : Disabled
  XPU                          : Disabled

  Environment Variables
  NVIDIA_VISIBLE_DEVICES       : all
  CUDA_VERSION                 : 13.0.2
  VLLM_USAGE_SOURCE            : production-docker-image
  VLLM_ENABLE_CUDA_COMPATIBILITY: 0
  LD_LIBRARY_PATH              : /usr/local/nvidia/lib64:/usr/local/cuda/lib64:...
  TORCHINDUCTOR_CACHE_DIR      : /tmp/torchinductor_root

</details>

🐛 Describe the bug

When a request to the OpenAI-compatible /v1/chat/completions endpoint contains an input_audio content part with a uuid, vLLM returns HTTP 500:

AssertionError: Expected code to be unreachable, but got: None

This happens because the uuid case enters the multimodal compatibility parsing path. In that path, vLLM passes the whole content part to the audio parser instead of the nested input_audio object.

Current behavior:

input_audio_params = cast(dict[str, str], part)

Expected behavior:

input_audio_params = cast(InputAudio, part["input_audio"])

The non-uuid input_audio path already returns part["input_audio"], so the uuid path should use the same payload shape.

Minimal Reproducible Example

Send this request to /v1/chat/completions:

{
  "model": "gemma-4-E2B-it",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Please describe this audio in one sentence."
        },
        {
          "type": "input_audio",
          "input_audio": {
            "data": "<base64_audio>",
            "format": "wav"
          },
          "uuid": "audio-smoke-uuid-001"
        }
      ]
    }
  ],
  "max_tokens": 16,
  "temperature": 0
}

The key part is that the same input_audio content part carries a uuid:

{
  "type": "input_audio",
  "input_audio": {
    "data": "<base64_audio>",
    "format": "wav"
  },
  "uuid": "audio-smoke-uuid-001"
}

Expected behavior

vLLM should pass part["input_audio"] to the downstream audio parser and should not fail with:

Expected code to be unreachable, but got: None

If the audio payload itself is invalid or unsupported, the request should continue into the normal audio loading path and return the appropriate audio validation error.

Actual behavior

The request returns HTTP 500:

{
  "message": "Expected code to be unreachable, but got: None",
  "type": "InternalServerError",
  "param": null,
  "code": 500
}

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

FAQ

Expected behavior

vLLM should pass part["input_audio"] to the downstream audio parser and should not fail with:

Expected code to be unreachable, but got: None

If the audio payload itself is invalid or unsupported, the request should continue into the normal audio loading path and return the appropriate audio validation error.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: input_audio content with uuid is parsed incorrectly [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Your current environment

🐛 Describe the bug

Minimal Reproducible Example

Expected behavior

Actual behavior

FAQ

Expected behavior

Still need to ship something?

TRENDING