vllm - ✅(Solved) Fix [Bug]: Responses API `text.format.type="json_schema"` leaks `schema_` in non-stream responses and breaks streaming [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38245Fetched 2026-04-08 01:37:02
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
cross-referenced ×2closed ×1labeled ×1

Error Message

pydantic_core._pydantic_core.ValidationError: 3 validation errors for ResponseCreatedEvent response.text.format.ResponseFormatText.type Input should be 'text' [type=literal_error, input_value='json_schema', input_type=str] response.text.format.ResponseFormatTextJSONSchemaConfig.schema Field required [type=missing, input_value={'name': 'tool_calling_re...alling', 'strict': True}, input_type=dict] response.text.format.ResponseFormatJSONObject.type Input should be 'json_object' [type=literal_error, input_value='json_schema', input_type=str]

Fix Action

Fix / Workaround

============================== CPU Info

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-254 Off-line CPU(s) list: 255 Vendor ID: AuthenticAMD Model name: AMD EPYC 7H12 64-Core Processor CPU family: 23 Model: 49 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 Stepping: 0 Frequency boost: enabled CPU max MHz: 2600.0000 CPU min MHz: 0.0000 BogoMIPS: 5199.94 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es Virtualization: AMD-V L1d cache: 4 MiB (128 instances) L1i cache: 4 MiB (128 instances) L2 cache: 64 MiB (128 instances) L3 cache: 512 MiB (32 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-63,128-191 NUMA node1 CPU(s): 64-127,192-254 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Vulnerability Spec rstack overflow: Mitigation; Safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

PR fix notes

PR #38262: [frontend] dump openai responses type by alias

Description (problem / solution / changelog)

Purpose

Some openai types (e.g. openai.types.responses.response_format_text_json_schema_config.ResponseFormatTextJSONSchemaConfig) uses aliases for their fields. As openai dumps these types by alias, we do the same in responses endpoint.

Fix #38245.

Test Plan

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • vllm/entrypoints/openai/responses/protocol.py (modified, +2/-2)

PR #38519: Fix Responses JSON schema alias serialization

Description (problem / solution / changelog)

Purpose

Fix Responses API JSON Schema alias serialization so streamed response.created and JSON responses emit the public schema field instead of internal schema_, which was breaking non-Harmony /v1/responses tool-calling with tool_choice="required".

Fix #38245 where #38262 doesn't completely fix it.

Test Plan

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/entrypoints/openai/responses/test_harmony.py (modified, +14/-16)
  • tests/entrypoints/openai/responses/test_mcp_tools.py (modified, +6/-6)
  • tests/entrypoints/openai/responses/test_serving_responses.py (modified, +71/-1)
  • vllm/entrypoints/openai/responses/api_router.py (modified, +11/-7)
  • vllm/entrypoints/openai/responses/protocol.py (modified, +1/-1)
  • vllm/entrypoints/openai/responses/serving.py (modified, +1/-1)

Code Example

==============================
        System Info
==============================
OS                           : Ubuntu 22.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version                : 14.0.0-1ubuntu1.1
CMake version                : version 3.22.1
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.5 (main, Aug 14 2024, 05:08:31) [Clang 18.1.8 ] (64-bit runtime)
Python platform              : Linux-6.8.0-59-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.1.105
CUDA_MODULE_LOADING set to   :
GPU models and configuration :
GPU 0: NVIDIA A800 80GB PCIe
GPU 1: NVIDIA A800 80GB PCIe
GPU 2: NVIDIA A800 80GB PCIe
GPU 3: NVIDIA A800 80GB PCIe
GPU 4: NVIDIA A800 80GB PCIe
GPU 5: NVIDIA A800 80GB PCIe
GPU 6: NVIDIA A800 80GB PCIe
GPU 7: NVIDIA A800 80GB PCIe

Nvidia driver version        : 550.90.07
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        43 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               256
On-line CPU(s) list:                  0-254
Off-line CPU(s) list:                 255
Vendor ID:                            AuthenticAMD
Model name:                           AMD EPYC 7H12 64-Core Processor
CPU family:                           23
Model:                                49
Thread(s) per core:                   2
Core(s) per socket:                   64
Socket(s):                            2
Stepping:                             0
Frequency boost:                      enabled
CPU max MHz:                          2600.0000
CPU min MHz:                          0.0000
BogoMIPS:                             5199.94
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization:                       AMD-V
L1d cache:                            4 MiB (128 instances)
L1i cache:                            4 MiB (128 instances)
L2 cache:                             64 MiB (128 instances)
L3 cache:                             512 MiB (32 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-63,128-191
NUMA node1 CPU(s):                    64-127,192-254
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.6
[pip3] mypy-extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.595.45
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] open-clip-torch==2.32.0
[pip3] pytorch-lightning==2.6.1
[pip3] pyzmq==27.1.0
[pip3] segmentation-models-pytorch==0.5.0
[pip3] sentence-transformers==5.3.0
[pip3] terratorch==1.2.5
[pip3] torch==2.10.0
[pip3] torch-c-dlpack-ext==0.1.5
[pip3] torchaudio==2.10.0
[pip3] torchgeo==0.9.0
[pip3] torchmetrics==1.9.0
[pip3] torchvision==0.25.0
[pip3] transformers==4.57.5
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.6.0
[pip3] tritonclient==2.66.0
[pip3] vector-quantize-pytorch==1.28.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.16.0rc2.dev1529+g721a8d151 (git sha: 721a8d151)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PXB     PXB     PXB     SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU1    PXB      X      PXB     PXB     SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU2    PXB     PXB      X      PIX     SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU3    PXB     PXB     PIX      X      SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU4    SYS     SYS     SYS     SYS      X      PXB     PXB     PXB     64-127,192-254  1               N/A
GPU5    SYS     SYS     SYS     SYS     PXB      X      PXB     PXB     64-127,192-254  1               N/A
GPU6    SYS     SYS     SYS     SYS     PXB     PXB      X      PIX     64-127,192-254  1               N/A
GPU7    SYS     SYS     SYS     SYS     PXB     PXB     PIX      X      64-127,192-254  1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_akk

---

vllm serve Qwen/Qwen3.5-9B \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser qwen3 \
  --host 0.0.0.0 \
  --port 9876 \
  --enable-prefix-caching

---

curl -sS -D - http://192.168.80.2:9876/v1/responses \
  -H 'Content-Type: application/json' \
  --data-binary @- <<'JSON'
{"model":"Qwen/Qwen3.5-9B","input":"return object with x=1","stream":false,"text":{"format":{"type":"json_schema","name":"tool_calling_response_format","schema":{"type":"object","properties":{"x":{"type":"integer"}},"required":["x"],"additionalProperties":false},"strict":true}}}
JSON

---

"text": {
  "format": {
    "name": "tool_calling_response_format",
    "schema_": {
      "type": "object",
      "properties": {
        "x": {
          "type": "integer"
        }
      },
      "required": [
        "x"
      ],
      "additionalProperties": false
    },
    "type": "json_schema",
    "description": null,
    "strict": true
  },
  "verbosity": null
}

---

curl -N -sS -D - http://192.168.80.2:9876/v1/responses \
  -H 'Content-Type: application/json' \
  -H 'Accept: text/event-stream' \
  --data-binary @- <<'JSON'
{"model":"Qwen/Qwen3.5-9B","input":"return object with x=1","stream":true,"text":{"format":{"type":"json_schema","name":"tool_calling_response_format","schema":{"type":"object","properties":{"x":{"type":"integer"}},"required":["x"],"additionalProperties":false},"strict":true}}}
JSON

---

curl: (18) transfer closed with outstanding read data remaining

---

pydantic_core._pydantic_core.ValidationError: 3 validation errors for ResponseCreatedEvent
response.text.format.ResponseFormatText.type
  Input should be 'text' [type=literal_error, input_value='json_schema', input_type=str]
response.text.format.ResponseFormatTextJSONSchemaConfig.schema
  Field required [type=missing, input_value={'name': 'tool_calling_re...alling', 'strict': True}, input_type=dict]
response.text.format.ResponseFormatJSONObject.type
  Input should be 'json_object' [type=literal_error, input_value='json_schema', input_type=str]
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version                : 14.0.0-1ubuntu1.1
CMake version                : version 3.22.1
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.5 (main, Aug 14 2024, 05:08:31) [Clang 18.1.8 ] (64-bit runtime)
Python platform              : Linux-6.8.0-59-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.1.105
CUDA_MODULE_LOADING set to   :
GPU models and configuration :
GPU 0: NVIDIA A800 80GB PCIe
GPU 1: NVIDIA A800 80GB PCIe
GPU 2: NVIDIA A800 80GB PCIe
GPU 3: NVIDIA A800 80GB PCIe
GPU 4: NVIDIA A800 80GB PCIe
GPU 5: NVIDIA A800 80GB PCIe
GPU 6: NVIDIA A800 80GB PCIe
GPU 7: NVIDIA A800 80GB PCIe

Nvidia driver version        : 550.90.07
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        43 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               256
On-line CPU(s) list:                  0-254
Off-line CPU(s) list:                 255
Vendor ID:                            AuthenticAMD
Model name:                           AMD EPYC 7H12 64-Core Processor
CPU family:                           23
Model:                                49
Thread(s) per core:                   2
Core(s) per socket:                   64
Socket(s):                            2
Stepping:                             0
Frequency boost:                      enabled
CPU max MHz:                          2600.0000
CPU min MHz:                          0.0000
BogoMIPS:                             5199.94
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization:                       AMD-V
L1d cache:                            4 MiB (128 instances)
L1i cache:                            4 MiB (128 instances)
L2 cache:                             64 MiB (128 instances)
L3 cache:                             512 MiB (32 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-63,128-191
NUMA node1 CPU(s):                    64-127,192-254
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.6
[pip3] mypy-extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.595.45
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] open-clip-torch==2.32.0
[pip3] pytorch-lightning==2.6.1
[pip3] pyzmq==27.1.0
[pip3] segmentation-models-pytorch==0.5.0
[pip3] sentence-transformers==5.3.0
[pip3] terratorch==1.2.5
[pip3] torch==2.10.0
[pip3] torch-c-dlpack-ext==0.1.5
[pip3] torchaudio==2.10.0
[pip3] torchgeo==0.9.0
[pip3] torchmetrics==1.9.0
[pip3] torchvision==0.25.0
[pip3] transformers==4.57.5
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.6.0
[pip3] tritonclient==2.66.0
[pip3] vector-quantize-pytorch==1.28.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.16.0rc2.dev1529+g721a8d151 (git sha: 721a8d151)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PXB     PXB     PXB     SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU1    PXB      X      PXB     PXB     SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU2    PXB     PXB      X      PIX     SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU3    PXB     PXB     PIX      X      SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU4    SYS     SYS     SYS     SYS      X      PXB     PXB     PXB     64-127,192-254  1               N/A
GPU5    SYS     SYS     SYS     SYS     PXB      X      PXB     PXB     64-127,192-254  1               N/A
GPU6    SYS     SYS     SYS     SYS     PXB     PXB      X      PIX     64-127,192-254  1               N/A
GPU7    SYS     SYS     SYS     SYS     PXB     PXB     PIX      X      64-127,192-254  1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_akk
</details>

🐛 Describe the bug

The /v1/responses endpoint appears to mishandle text.format.type="json_schema".

Using vLLM v0.18.0, using model Qwen/Qwen3.5-9B.

Observed behavior:

  • text.format.type="json_object" works in both non-stream and stream mode.
  • text.format.type="json_schema":
    • non-stream returns 200 OK, but the response body uses text.format.schema_ instead of text.format.schema
    • stream starts with 200 OK, then the connection closes before a complete SSE body is sent

This looks like a Responses serialization bug rather than a model-output issue.

Minimal repro

Server launch:

vllm serve Qwen/Qwen3.5-9B \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser qwen3 \
  --host 0.0.0.0 \
  --port 9876 \
  --enable-prefix-caching

Case: json_schema, non-stream

curl -sS -D - http://192.168.80.2:9876/v1/responses \
  -H 'Content-Type: application/json' \
  --data-binary @- <<'JSON'
{"model":"Qwen/Qwen3.5-9B","input":"return object with x=1","stream":false,"text":{"format":{"type":"json_schema","name":"tool_calling_response_format","schema":{"type":"object","properties":{"x":{"type":"integer"}},"required":["x"],"additionalProperties":false},"strict":true}}}
JSON

Observed result:

  • HTTP 200 OK
  • response body contains text.format.schema_ instead of text.format.schema

Relevant excerpt:

"text": {
  "format": {
    "name": "tool_calling_response_format",
    "schema_": {
      "type": "object",
      "properties": {
        "x": {
          "type": "integer"
        }
      },
      "required": [
        "x"
      ],
      "additionalProperties": false
    },
    "type": "json_schema",
    "description": null,
    "strict": true
  },
  "verbosity": null
}

Case: json_schema, stream

curl -N -sS -D - http://192.168.80.2:9876/v1/responses \
  -H 'Content-Type: application/json' \
  -H 'Accept: text/event-stream' \
  --data-binary @- <<'JSON'
{"model":"Qwen/Qwen3.5-9B","input":"return object with x=1","stream":true,"text":{"format":{"type":"json_schema","name":"tool_calling_response_format","schema":{"type":"object","properties":{"x":{"type":"integer"}},"required":["x"],"additionalProperties":false},"strict":true}}}
JSON

Observed result:

  • HTTP 200 OK headers
  • then the connection aborts
  • client-side error:
curl: (18) transfer closed with outstanding read data remaining

I also previously observed the corresponding server-side traceback:

pydantic_core._pydantic_core.ValidationError: 3 validation errors for ResponseCreatedEvent
response.text.format.ResponseFormatText.type
  Input should be 'text' [type=literal_error, input_value='json_schema', input_type=str]
response.text.format.ResponseFormatTextJSONSchemaConfig.schema
  Field required [type=missing, input_value={'name': 'tool_calling_re...alling', 'strict': True}, input_type=dict]
response.text.format.ResponseFormatJSONObject.type
  Input should be 'json_object' [type=literal_error, input_value='json_schema', input_type=str]

Expected behavior

For text.format.type="json_schema":

  • non-stream responses should serialize the public wire field schema, not the internal field name schema_
  • stream mode should emit a valid SSE lifecycle and complete normally

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the serialization bug in the /v1/responses endpoint when text.format.type="json_schema", follow these steps:

  • Update the ResponseFormatText model to correctly handle json_schema type:
    • Add a new field schema to the ResponseFormatTextJSONSchemaConfig model.
    • Update the ResponseFormatText model to use the new schema field when type is json_schema.
  • Fix the serialization of the ResponseCreatedEvent to use the public wire field schema instead of the internal field schema_.
  • Ensure that the stream mode emits a valid SSE lifecycle and completes normally.

Example code changes:

from pydantic import BaseModel

class ResponseFormatTextJSONSchemaConfig(BaseModel):
    # ... existing fields ...
    schema: dict  # New field to hold the JSON schema

class ResponseFormatText(BaseModel):
    # ... existing fields ...
    type: str  # Type of format (e.g. 'json_schema')
    # ... existing fields ...

    def serialize(self):
        if self.type == 'json_schema':
            return {'schema': self.schema}
        # ... existing serialization logic ...

class ResponseCreatedEvent(BaseModel):
    # ... existing fields ...
    text: ResponseFormatText  # Updated field to use the new schema field

    def serialize(self):
        # ... existing serialization logic ...
        if self.text.type == 'json_schema':
            return {'text': {'format': {'schema': self.text.schema}}}
        # ... existing serialization logic ...

Verification

To verify the fix, test the /v1/responses endpoint with text.format.type="json_schema" in both non-stream and stream modes. The response should contain the correct schema field, and the stream mode should emit a valid SSE lifecycle and complete normally.

Example test commands:

curl -sS -D - http://192.168.80.2:9876/v1/responses \
  -H 'Content-Type: application/json' \
  --data-binary @- <<'JSON'
{"model":"Qwen/Qwen3.5-9B","input":"return object with x=1","stream":false,"text":{"format":{"type":"json_schema","name":"tool_calling_response_format","schema":{"type":"object","properties":{"x":{"type":"integer"}},"required":["x"],"additionalProperties":false},"strict":true}}}
JSON
curl -N -sS -D - http://192.168.80.2:9876/v1/responses \
  -H 'Content-Type: application/json' \
  -H 'Accept: text/event-stream' \
  --data-binary @- <<'JSON'
{"model":"Qwen/Qwen3.5-9B","input":"return object with x=1","

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For text.format.type="json_schema":

  • non-stream responses should serialize the public wire field schema, not the internal field name schema_
  • stream mode should emit a valid SSE lifecycle and complete normally

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: Responses API `text.format.type="json_schema"` leaks `schema_` in non-stream responses and breaks streaming [2 pull requests, 1 participants]