llamaIndex - ✅(Solved) Fix [Bug]: [Bug]: to_openai_responses_message_dict serializes input_file using nested "file" object, which does not match current OpenAI Responses API schema [1 pull requests, 4 comments, 4 participants]

Q: Expected behavior

For Responses API serialization, `DocumentBlock` should be converted to: ```python content.append( { "type": "input_file", "filename": block.title, "file_data": f"data:{mimetype};base64,{b64_string}", } ) ``` Potentially also supporting: ```python { "type": "input_file", "file_id": "...", } ``` or ```python { "type": "input_file", "file_url": "...", } ``` when appropriate.

llamaIndex2026-03-25 14:21:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#21146•Fetched 2026-04-08 01:31:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4labeled ×2closed ×1cross-referenced ×1

PR fix notes

PR #21172: Fix input_file serialization in Responses API message dict

Repository: run-llama/llama_index
Author: joaquinhuigomez
State: closed | merged: True
Link: https://github.com/run-llama/llama_index/pull/21172

Description (problem / solution / changelog)

Summary

to_openai_responses_message_dict wraps filename and file_data inside a nested "file" key when serializing DocumentBlock as input_file. The OpenAI Responses API expects these fields flat on the input_file object — there is no file wrapper in the schema.

The nested structure was carried over from the Chat Completions serializer (to_openai_message_dicts), which correctly uses {"type": "file", "file": {...}}. When the Responses API path was added, type was changed to "input_file" but the nesting was not removed.

Before

{
    "type": "input_file",
    "file": {
        "filename": block.title,
        "file_data": f"data:{mimetype};base64,{b64_string}",
    },
}

After

{
    "type": "input_file",
    "filename": block.title,
    "file_data": f"data:{mimetype};base64,{b64_string}",
}

Verified against the openai-python SDK ResponseInputFile model.

Fixes #21146

Changed files

llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py (modified, +2/-4)
llama-index-integrations/llms/llama-index-llms-openai/pyproject.toml (modified, +1/-1)

Code Example

### Bug Description

`to_openai_responses_message_dict()` appears to serialize `DocumentBlock` into this shape:

---

However, according to the current OpenAI **Responses API** schema, `input_file` fields should be placed directly on the `input_file` item itself, not nested under a `file` object.

Expected shape:

---

The current implementation seems to mix the old Chat Completions-style `"type": "file", "file": {...}` pattern with the newer Responses API `"type": "input_file"` format.

### Why this seems incorrect

OpenAI's current Responses API docs define `input_file` with these fields directly on the object:

* `type: "input_file"`
* `file_data`
* `file_id`
* `file_url`
* `filename`

There is no nested `file` object in the documented `input_file` schema.

So this:

---

does not appear to match the current Responses API spec.

### Current LlamaIndex code

---

### Expected behavior

For Responses API serialization, `DocumentBlock` should be converted to:

---

Potentially also supporting:

---

or

---

when appropriate.

### Version

0.14.18

### Steps to Reproduce

1. Use `DocumentBlock` in a message passed through the Responses API path.
2. Inspect the serialized payload from `to_openai_responses_message_dict()`.
3. Observe that it generates:

---

instead of the flat `input_file` object expected by the Responses API.

### Relevant Logs/Tracbacks

RAW_BUFFERClick to expand / collapse

Bug Description

Body

### Bug Description

`to_openai_responses_message_dict()` appears to serialize `DocumentBlock` into this shape:

```python
{
    "type": "input_file",
    "file": {
        "filename": block.title,
        "file_data": f"data:{mimetype};base64,{b64_string}",
    },
}

However, according to the current OpenAI Responses API schema, input_file fields should be placed directly on the input_file item itself, not nested under a file object.

Expected shape:

{
    "type": "input_file",
    "filename": block.title,
    "file_data": f"data:{mimetype};base64,{b64_string}",
}

The current implementation seems to mix the old Chat Completions-style "type": "file", "file": {...} pattern with the newer Responses API "type": "input_file" format.

Why this seems incorrect

OpenAI's current Responses API docs define input_file with these fields directly on the object:

type: "input_file"
file_data
file_id
file_url
filename

There is no nested file object in the documented input_file schema.

So this:

{
    "type": "input_file",
    "file": {...}
}

does not appear to match the current Responses API spec.

Current LlamaIndex code

def to_openai_responses_message_dict(
    message: ChatMessage,
    drop_none: bool = False,
    model: Optional[str] = None,
    store: bool = False,
) -> Union[str, Dict[str, Any], List[Dict[str, Any]]]:
    content = []
    content_txt = ""
    tool_calls = []
    reasoning = []

    for block in message.blocks:
        if isinstance(block, TextBlock):
            if message.role.value == "user":
                content.append({"type": "input_text", "text": block.text})
            else:
                content.append({"type": "output_text", "text": block.text})
            content_txt += block.text
        elif isinstance(block, DocumentBlock):
            if not block.data:
                file_buffer = block.resolve_document()
                b64_string = block._get_b64_string(file_buffer)
                mimetype = block._guess_mimetype()
            else:
                b64_string = block.data.decode("utf-8")
                mimetype = block._guess_mimetype()
            content.append(
                {
                    "type": "input_file",
                    "file": {
                        "filename": block.title,
                        "file_data": f"data:{mimetype};base64,{b64_string}",
                    },
                }
            )

Expected behavior

For Responses API serialization, DocumentBlock should be converted to:

content.append(
    {
        "type": "input_file",
        "filename": block.title,
        "file_data": f"data:{mimetype};base64,{b64_string}",
    }
)

Potentially also supporting:

{
    "type": "input_file",
    "file_id": "...",
}

{
    "type": "input_file",
    "file_url": "...",
}

when appropriate.

Version

0.14.18

Steps to Reproduce

Use DocumentBlock in a message passed through the Responses API path.
Inspect the serialized payload from to_openai_responses_message_dict().
Observe that it generates:

{
    "type": "input_file",
    "file": {
        "filename": ...,
        "file_data": ...
    }
}

instead of the flat input_file object expected by the Responses API.

Relevant Logs/Tracbacks

extent analysis

Fix Plan

To fix the issue, we need to modify the to_openai_responses_message_dict function to correctly serialize DocumentBlock into the expected shape.

Update the elif isinstance(block, DocumentBlock): block to directly append the input_file fields without nesting them under a file object.

elif isinstance(block, DocumentBlock):
    if not block.data:
        file_buffer = block.resolve_document()
        b64_string = block._get_b64_string(file_buffer)
        mimetype = block._guess_mimetype()
    else:
        b64_string = block.data.decode("utf-8")
        mimetype = block._guess_mimetype()
    content.append(
        {
            "type": "input_file",
            "filename": block.title,
            "file_data": f"data:{mimetype};base64,{b64_string}",
        }
    )

Optionally, add support for file_id and file_url fields when appropriate.

elif isinstance(block, DocumentBlock):
    if not block.data:
        file_buffer = block.resolve_document()
        b64_string = block._get_b64_string(file_buffer)
        mimetype = block._guess_mimetype()
    else:
        b64_string = block.data.decode("utf-8")
        mimetype = block._guess_mimetype()
    if block.file_id:
        content.append(
            {
                "type": "input_file",
                "file_id": block.file_id,
            }
        )
    elif block.file_url:
        content.append(
            {
                "type": "input_file",
                "file_url": block.file_url,
            }
        )
    else:
        content.append(
            {
                "type": "input_file",
                "filename": block.title,
                "file_data": f"data:{mimetype};base64,{b64_string}",
            }
        )

Verification

To verify the fix, inspect the serialized payload from to_openai_responses_message_dict() and ensure it generates the expected flat input_file object.

Extra Tips

Make sure to test the updated function with different types of DocumentBlock instances to ensure correct serialization.
Consider adding additional error handling or logging to handle cases where the DocumentBlock instance is missing required fields.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

For Responses API serialization, DocumentBlock should be converted to:

content.append(
    {
        "type": "input_file",
        "filename": block.title,
        "file_data": f"data:{mimetype};base64,{b64_string}",
    }
)

Potentially also supporting:

{
    "type": "input_file",
    "file_id": "...",
}

{
    "type": "input_file",
    "file_url": "...",
}

when appropriate.

#api #mixed precision #training loop #device allocation #model download

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

llamaIndex - ✅(Solved) Fix [Bug]: [Bug]: to_openai_responses_message_dict serializes input_file using nested "file" object, which does not match current OpenAI Responses API schema [1 pull requests, 4 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #21172: Fix input_file serialization in Responses API message dict

Description (problem / solution / changelog)

Summary

Before

After

Changed files

Code Example

Bug Description

Body

Why this seems incorrect

Current LlamaIndex code

Expected behavior

Version

Steps to Reproduce

Relevant Logs/Tracbacks

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: [Bug]: to_openai_responses_message_dict serializes input_file using nested "file" object, which does not match current OpenAI Responses API schema [1 pull requests, 4 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #21172: Fix input_file serialization in Responses API message dict

Description (problem / solution / changelog)

Summary

Before

After

Changed files

Code Example

Bug Description

Body

Why this seems incorrect

Current LlamaIndex code

Expected behavior

Version

Steps to Reproduce

Relevant Logs/Tracbacks

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING