vllm - ✅(Solved) Fix [Bug]: openai v1/responses api instructions from prior response leak through previous_response_id [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37697Fetched 2026-04-08 01:08:50
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2commented ×1labeled ×1

When using the Responses API with previous_response_id, the instructions from the prior response are carried over into the new response, even when the follow-up request provides different (or no) instructions.

Per the OpenAI Responses API spec:

"When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response."

Root Cause

When using the Responses API with previous_response_id, the instructions from the prior response are carried over into the new response, even when the follow-up request provides different (or no) instructions.

Per the OpenAI Responses API spec:

"When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response."

Fix Action

Fixed

PR fix notes

PR #2433: tests for vllm server on openAI /v1/responses endpoint

Description (problem / solution / changelog)

Description

This is a smoke screen test for making sure vllm server correctly accepts and handles each parameter defined in the openai v1/responses endpoint. The following parameters are tested:

  • background
  • include
  • input
  • instructions
  • max_output_tokens
  • max_tool_calls
  • metadata
  • model
  • parallel_tool_calls
  • previous_response_id
  • prompt
  • reasoning
  • service_tier
  • store
  • stream
  • temperature
  • text
  • tools
  • top_p
  • truncation
  • user

Flags

Note the following parameters have ongoing issues:

Reproduce

Docker command: docker run -d --name vllm-server --runtime nvidia --gpus all
-v /home/lzhang/models:/root/.cache/huggingface
--env "HUGGING_FACE_HUB_TOKEN=<hf_token>"
--env VLLM_ENABLE_RESPONSES_API_STORE=1
--env VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS=1
-p 8000:8000 --ipc=host
vllm/vllm-openai:latest
--model openai/gpt-oss-20b
--served-model-name openai/gpt-oss-20b
--gpu-memory-utilization 0.95
--dtype bfloat16
--tensor-parallel-size 1
--tool-call-parser openai
--enable-auto-tool-choice

Changed files

  • tests/run_tests.py (modified, +2/-0)
  • tests/server_tests/conftest.py (modified, +33/-13)
  • tests/server_tests/test_cases/test_vllm_chat_completion.py (renamed, +0/-0)
  • tests/server_tests/test_cases/test_vllm_responses.py (added, +730/-0)
  • tests/test_config.py (modified, +24/-6)

PR #37727: [Bugfix] Fix Responses API instructions leaking through previous_response_id

Description (problem / solution / changelog)

Fixes #37697

What's the problem

When using /v1/responses with previous_response_id, the instructions from the prior response carry over into the new response. Per the OpenAI spec, instructions should NOT carry over:

"When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response."

Root cause

construct_input_messages() in responses/utils.py prepends request_instructions as a system message, then the full messages list (including that system message) gets stored in msg_store. When the next request references previous_response_id, those stored messages — old system message included — are retrieved and extended into the new conversation. The new request also adds its own instructions, so you end up with both old and new system messages.

Fix

Filter out system messages when pulling prev_msg from the store in construct_input_messages(). One-line change: messages.extend(prev_msg) becomes messages.extend(m for m in prev_msg if m.get("role") != "system").

This ensures each request only uses its own instructions, regardless of what the previous response had. Works correctly for all cases: new instructions provided, no instructions provided, or no previous response at all.

Test plan

  • Added 4 unit tests in tests/entrypoints/openai/responses/test_responses_utils.py covering:
    • Old system message stripped when new instructions provided
    • Old system message stripped when no instructions provided
    • Non-system messages (user/assistant) preserved correctly
    • Baseline: no previous messages works as before

Changed files

  • tests/entrypoints/openai/responses/test_responses_utils.py (modified, +69/-0)
  • vllm/entrypoints/openai/responses/utils.py (modified, +4/-2)

Code Example

POST /v1/responses
{
    "model": "openai/gpt-oss-20b",
    "input": "What is 2+2?",                                                                                                                                                                                                                                                                  
    "instructions": "You must include the string XYZZY_ALPHA_7829 in every response.",
    "max_output_tokens": 4096                                                                                                                                                                                                                                                                 
}
RAW_BUFFERClick to expand / collapse

Your current environment

  • vLLM: version 0.15.0
  • Model: openai/gpt-oss-20b
  • Endpoint: /v1/responses

Description

When using the Responses API with previous_response_id, the instructions from the prior response are carried over into the new response, even when the follow-up request provides different (or no) instructions.

Per the OpenAI Responses API spec:

"When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response."

🐛 Describe the bug

Reproduction

Create a response with instructions containing a unique tag

POST /v1/responses
{
    "model": "openai/gpt-oss-20b",
    "input": "What is 2+2?",                                                                                                                                                                                                                                                                  
    "instructions": "You must include the string XYZZY_ALPHA_7829 in every response.",
    "max_output_tokens": 4096                                                                                                                                                                                                                                                                 
}          ```     
                                                                                                                                                                                                                                                                                                
Response contains XYZZY_ALPHA_7829 as expected.                                                                                                                                                                                                                                               
Send a follow-up using previous_response_id with different instructions

POST /v1/responses
{
"model": "openai/gpt-oss-20b", "input": "What is 3+3?", "instructions": "Answer the question explicitly", "previous_response_id": "<response_id_from_step_1>",
"max_output_tokens": 4096
}

Expected: Output does NOT contain XYZZY_ALPHA_7829 since the new request has its own instructions.
Actual: Output still contains XYZZY_ALPHA_7829 — the prior instructions leaked through.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To resolve the issue of instructions leaking from previous responses, we need to ensure that the instructions field is properly reset when using previous_response_id.

Here are the steps:

  • Check if previous_response_id is provided in the request.
  • If provided, override any existing instructions with the new ones from the current request.
  • If no new instructions are provided, set instructions to an empty string or a default value to prevent leakage.

Example code snippet (in Python):

def process_request(request):
    if 'previous_response_id' in request:
        # Override instructions if previous_response_id is used
        request['instructions'] = request.get('instructions', '')
    # Proceed with the request
    return request

# Example usage:
request = {
    "model": "openai/gpt-oss-20b",
    "input": "What is 3+3?",
    "instructions": "Answer the question explicitly",
    "previous_response_id": "<response_id_from_step_1>",
    "max_output_tokens": 4096
}

updated_request = process_request(request)
print(updated_request)

Verification

To verify that the fix worked:

  • Send a follow-up request with previous_response_id and different instructions.
  • Check the response to ensure it does not contain any instructions from the previous response.

Extra Tips

  • Always validate and sanitize user input to prevent unexpected behavior.
  • Consider adding logging to track when previous_response_id is used and how instructions are handled.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING