litellm - 💡(How to fix) Fix [Bug]: Errors mid stream not re-raise in Responses API with Openai models

litellm2026-05-28 22:26:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

20:42:27 - LiteLLM:DEBUG: token_counter.py:399 - messages in token_counter: None, text in token_counter: 20:42:27 - LiteLLM:DEBUG: litellm_logging.py:1564 - response_cost: 0.30000075 20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.created', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'in_progress', 'background': False, 'completed_at': None, 'error': None, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 0} chunk type=<ResponsesAPIStreamEvents.RESPONSE_CREATED: 'response.created'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='in_progress', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=0 20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.in_progress', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'in_progress', 'background': False, 'completed_at': None, 'error': None, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 1} chunk type=<ResponsesAPIStreamEvents.RESPONSE_IN_PROGRESS: 'response.in_progress'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='in_progress', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=1 20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'error', 'error': {'type': 'invalid_request_error', 'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.', 'param': 'input'}, 'sequence_number': 2} chunk type=<ResponsesAPIStreamEvents.ERROR: 'error'> sequence_number=2 error=ErrorEventError(type='invalid_request_error', code='context_length_exceeded', message='Your input exceeds the context window of this model. Please adjust your input and try again.', param='input') 20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.failed', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'failed', 'background': False, 'completed_at': None, 'error': {'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.'}, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 3} 20:42:27 - LiteLLM:DEBUG: litellm_logging.py:2931 - Logging Details LiteLLM-Failure Call: [] chunk type=<ResponsesAPIStreamEvents.RESPONSE_FAILED: 'response.failed'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error={'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.'}, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='failed', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=3

Code Example

type=<ResponsesAPIStreamEvents.RESPONSE_FAILED: 'response.failed'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error={'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.'}, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='failed', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=3

---

import litellm
from litellm import aresponses  

litellm._turn_on_debug()

input= "Hello " * 400_000

response = await aresponses(
    input=input,
    model="gpt-5.4-mini",
    stream=True,
)

async for chunk in response:
    print("chunk", chunk) # Error as event print here

---

20:42:27 - LiteLLM:DEBUG: token_counter.py:399 - messages in token_counter: None, text in token_counter:
20:42:27 - LiteLLM:DEBUG: litellm_logging.py:1564 - response_cost: 0.30000075
20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.created', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'in_progress', 'background': False, 'completed_at': None, 'error': None, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 0}
chunk type=<ResponsesAPIStreamEvents.RESPONSE_CREATED: 'response.created'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='in_progress', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=0
20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.in_progress', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'in_progress', 'background': False, 'completed_at': None, 'error': None, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 1}
chunk type=<ResponsesAPIStreamEvents.RESPONSE_IN_PROGRESS: 'response.in_progress'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='in_progress', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=1
20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'error', 'error': {'type': 'invalid_request_error', 'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.', 'param': 'input'}, 'sequence_number': 2}
chunk type=<ResponsesAPIStreamEvents.ERROR: 'error'> sequence_number=2 error=ErrorEventError(type='invalid_request_error', code='context_length_exceeded', message='Your input exceeds the context window of this model. Please adjust your input and try again.', param='input')
20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.failed', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'failed', 'background': False, 'completed_at': None, 'error': {'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.'}, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 3}
20:42:27 - LiteLLM:DEBUG: litellm_logging.py:2931 - Logging Details LiteLLM-Failure Call: []
chunk type=<ResponsesAPIStreamEvents.RESPONSE_FAILED: 'response.failed'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error={'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.'}, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='failed', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=3

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Errors seems not to be re-raised mid stream while using the Responses API with openai models. I observe the behaviour with :

Missing API key
Context windows exceeded

Instead of re-raising the error, the error or just yield as event :

type=<ResponsesAPIStreamEvents.RESPONSE_FAILED: 'response.failed'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error={'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.'}, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='failed', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=3

Steps to Reproduce

import litellm
from litellm import aresponses  

litellm._turn_on_debug()

input= "Hello " * 400_000

response = await aresponses(
    input=input,
    model="gpt-5.4-mini",
    stream=True,
)

async for chunk in response:
    print("chunk", chunk) # Error as event print here

Relevant log output

20:42:27 - LiteLLM:DEBUG: token_counter.py:399 - messages in token_counter: None, text in token_counter:
20:42:27 - LiteLLM:DEBUG: litellm_logging.py:1564 - response_cost: 0.30000075
20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.created', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'in_progress', 'background': False, 'completed_at': None, 'error': None, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 0}
chunk type=<ResponsesAPIStreamEvents.RESPONSE_CREATED: 'response.created'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='in_progress', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=0
20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.in_progress', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'in_progress', 'background': False, 'completed_at': None, 'error': None, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 1}
chunk type=<ResponsesAPIStreamEvents.RESPONSE_IN_PROGRESS: 'response.in_progress'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='in_progress', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=1
20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'error', 'error': {'type': 'invalid_request_error', 'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.', 'param': 'input'}, 'sequence_number': 2}
chunk type=<ResponsesAPIStreamEvents.ERROR: 'error'> sequence_number=2 error=ErrorEventError(type='invalid_request_error', code='context_length_exceeded', message='Your input exceeds the context window of this model. Please adjust your input and try again.', param='input')
20:42:27 - LiteLLM:DEBUG: transformation.py:296 - Raw OpenAI Chunk={'type': 'response.failed', 'response': {'id': 'resp_03ebaa3977b46f6f006a188c8b2d48819fb675c7d3317e225a', 'object': 'response', 'created_at': 1779993739, 'status': 'failed', 'background': False, 'completed_at': None, 'error': {'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.'}, 'frequency_penalty': 0.0, 'incomplete_details': None, 'instructions': None, 'max_output_tokens': None, 'max_tool_calls': None, 'model': 'gpt-5.4-mini-2026-03-17', 'moderation': None, 'output': [], 'parallel_tool_calls': True, 'presence_penalty': 0.0, 'previous_response_id': None, 'prompt_cache_key': None, 'prompt_cache_retention': 'in_memory', 'reasoning': {'context': 'current_turn', 'effort': 'none', 'summary': None}, 'safety_identifier': None, 'service_tier': 'auto', 'store': True, 'temperature': 1.0, 'text': {'format': {'type': 'text'}, 'verbosity': 'medium'}, 'tool_choice': 'auto', 'tools': [], 'top_logprobs': 0, 'top_p': 0.98, 'truncation': 'disabled', 'usage': None, 'user': None, 'metadata': {}}, 'sequence_number': 3}
20:42:27 - LiteLLM:DEBUG: litellm_logging.py:2931 - Logging Details LiteLLM-Failure Call: []
chunk type=<ResponsesAPIStreamEvents.RESPONSE_FAILED: 'response.failed'> response=ResponsesAPIResponse(id='resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOm9wZW5haTttb2RlbF9pZDpOb25lO3Jlc3BvbnNlX2lkOnJlc3BfMDNlYmFhMzk3N2I0NmY2ZjAwNmExODhjOGIyZDQ4ODE5ZmI2NzVjN2QzMzE3ZTIyNWE=', created_at=1779993739, error={'code': 'context_length_exceeded', 'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.'}, incomplete_details=None, instructions=None, metadata={}, model='gpt-5.4-mini-2026-03-17', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=0.98, max_output_tokens=None, previous_response_id=None, reasoning={'context': 'current_turn', 'effort': 'none', 'summary': None}, status='failed', text={'format': {'type': 'text'}, 'verbosity': 'medium'}, truncation='disabled', usage=None, user=None, store=True, background=False, completed_at=None, frequency_penalty=0.0, max_tool_calls=None, moderation=None, presence_penalty=0.0, prompt_cache_key=None, prompt_cache_retention='in_memory', safety_identifier=None, service_tier='auto', top_logprobs=0) sequence_number=3

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

Test with v1.81.15, v1.85.1 & v1.86.2

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering