llamaIndex - ✅(Solved) Fix [Bug]: [Bug]: to_openai_responses_message_dict serializes input_file using nested "file" object, which does not match current OpenAI Responses API schema [1 pull requests, 4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21146Fetched 2026-04-08 01:31:26
View on GitHub
Comments
4
Participants
4
Timeline
9
Reactions
0
Author
Timeline (top)
commented ×4labeled ×2closed ×1cross-referenced ×1

PR fix notes

PR #21172: Fix input_file serialization in Responses API message dict

Description (problem / solution / changelog)

Summary

to_openai_responses_message_dict wraps filename and file_data inside a nested "file" key when serializing DocumentBlock as input_file. The OpenAI Responses API expects these fields flat on the input_file object — there is no file wrapper in the schema.

The nested structure was carried over from the Chat Completions serializer (to_openai_message_dicts), which correctly uses {"type": "file", "file": {...}}. When the Responses API path was added, type was changed to "input_file" but the nesting was not removed.

Before

{
    "type": "input_file",
    "file": {
        "filename": block.title,
        "file_data": f"data:{mimetype};base64,{b64_string}",
    },
}

After

{
    "type": "input_file",
    "filename": block.title,
    "file_data": f"data:{mimetype};base64,{b64_string}",
}

Verified against the openai-python SDK ResponseInputFile model.

Fixes #21146

Changed files

  • llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py (modified, +2/-4)
  • llama-index-integrations/llms/llama-index-llms-openai/pyproject.toml (modified, +1/-1)

Code Example

### Bug Description

`to_openai_responses_message_dict()` appears to serialize `DocumentBlock` into this shape:

---

However, according to the current OpenAI **Responses API** schema, `input_file` fields should be placed directly on the `input_file` item itself, not nested under a `file` object.

Expected shape:

---

The current implementation seems to mix the old Chat Completions-style `"type": "file", "file": {...}` pattern with the newer Responses API `"type": "input_file"` format.

### Why this seems incorrect

OpenAI's current Responses API docs define `input_file` with these fields directly on the object:

* `type: "input_file"`
* `file_data`
* `file_id`
* `file_url`
* `filename`

There is no nested `file` object in the documented `input_file` schema.

So this:

---

does not appear to match the current Responses API spec.

### Current LlamaIndex code

---

### Expected behavior

For Responses API serialization, `DocumentBlock` should be converted to:

---

Potentially also supporting:

---

or

---

when appropriate.

### Version

0.14.18

### Steps to Reproduce

1. Use `DocumentBlock` in a message passed through the Responses API path.
2. Inspect the serialized payload from `to_openai_responses_message_dict()`.
3. Observe that it generates:

---

instead of the flat `input_file` object expected by the Responses API.

### Relevant Logs/Tracbacks
RAW_BUFFERClick to expand / collapse

Bug Description

Body

### Bug Description

`to_openai_responses_message_dict()` appears to serialize `DocumentBlock` into this shape:

```python
{
    "type": "input_file",
    "file": {
        "filename": block.title,
        "file_data": f"data:{mimetype};base64,{b64_string}",
    },
}

However, according to the current OpenAI Responses API schema, input_file fields should be placed directly on the input_file item itself, not nested under a file object.

Expected shape:

{
    "type": "input_file",
    "filename": block.title,
    "file_data": f"data:{mimetype};base64,{b64_string}",
}

The current implementation seems to mix the old Chat Completions-style "type": "file", "file": {...} pattern with the newer Responses API "type": "input_file" format.

Why this seems incorrect

OpenAI's current Responses API docs define input_file with these fields directly on the object:

  • type: "input_file"
  • file_data
  • file_id
  • file_url
  • filename

There is no nested file object in the documented input_file schema.

So this:

{
    "type": "input_file",
    "file": {...}
}

does not appear to match the current Responses API spec.

Current LlamaIndex code

def to_openai_responses_message_dict(
    message: ChatMessage,
    drop_none: bool = False,
    model: Optional[str] = None,
    store: bool = False,
) -> Union[str, Dict[str, Any], List[Dict[str, Any]]]:
    content = []
    content_txt = ""
    tool_calls = []
    reasoning = []

    for block in message.blocks:
        if isinstance(block, TextBlock):
            if message.role.value == "user":
                content.append({"type": "input_text", "text": block.text})
            else:
                content.append({"type": "output_text", "text": block.text})
            content_txt += block.text
        elif isinstance(block, DocumentBlock):
            if not block.data:
                file_buffer = block.resolve_document()
                b64_string = block._get_b64_string(file_buffer)
                mimetype = block._guess_mimetype()
            else:
                b64_string = block.data.decode("utf-8")
                mimetype = block._guess_mimetype()
            content.append(
                {
                    "type": "input_file",
                    "file": {
                        "filename": block.title,
                        "file_data": f"data:{mimetype};base64,{b64_string}",
                    },
                }
            )

Expected behavior

For Responses API serialization, DocumentBlock should be converted to:

content.append(
    {
        "type": "input_file",
        "filename": block.title,
        "file_data": f"data:{mimetype};base64,{b64_string}",
    }
)

Potentially also supporting:

{
    "type": "input_file",
    "file_id": "...",
}

or

{
    "type": "input_file",
    "file_url": "...",
}

when appropriate.

Version

0.14.18

Steps to Reproduce

  1. Use DocumentBlock in a message passed through the Responses API path.
  2. Inspect the serialized payload from to_openai_responses_message_dict().
  3. Observe that it generates:
{
    "type": "input_file",
    "file": {
        "filename": ...,
        "file_data": ...
    }
}

instead of the flat input_file object expected by the Responses API.

Relevant Logs/Tracbacks

extent analysis

Fix Plan

To fix the issue, we need to modify the to_openai_responses_message_dict function to correctly serialize DocumentBlock into the expected shape.

  • Update the elif isinstance(block, DocumentBlock): block to directly append the input_file fields without nesting them under a file object.
elif isinstance(block, DocumentBlock):
    if not block.data:
        file_buffer = block.resolve_document()
        b64_string = block._get_b64_string(file_buffer)
        mimetype = block._guess_mimetype()
    else:
        b64_string = block.data.decode("utf-8")
        mimetype = block._guess_mimetype()
    content.append(
        {
            "type": "input_file",
            "filename": block.title,
            "file_data": f"data:{mimetype};base64,{b64_string}",
        }
    )
  • Optionally, add support for file_id and file_url fields when appropriate.
elif isinstance(block, DocumentBlock):
    if not block.data:
        file_buffer = block.resolve_document()
        b64_string = block._get_b64_string(file_buffer)
        mimetype = block._guess_mimetype()
    else:
        b64_string = block.data.decode("utf-8")
        mimetype = block._guess_mimetype()
    if block.file_id:
        content.append(
            {
                "type": "input_file",
                "file_id": block.file_id,
            }
        )
    elif block.file_url:
        content.append(
            {
                "type": "input_file",
                "file_url": block.file_url,
            }
        )
    else:
        content.append(
            {
                "type": "input_file",
                "filename": block.title,
                "file_data": f"data:{mimetype};base64,{b64_string}",
            }
        )

Verification

To verify the fix, inspect the serialized payload from to_openai_responses_message_dict() and ensure it generates the expected flat input_file object.

Extra Tips

  • Make sure to test the updated function with different types of DocumentBlock instances to ensure correct serialization.
  • Consider adding additional error handling or logging to handle cases where the DocumentBlock instance is missing required fields.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For Responses API serialization, DocumentBlock should be converted to:

content.append(
    {
        "type": "input_file",
        "filename": block.title,
        "file_data": f"data:{mimetype};base64,{b64_string}",
    }
)

Potentially also supporting:

{
    "type": "input_file",
    "file_id": "...",
}

or

{
    "type": "input_file",
    "file_url": "...",
}

when appropriate.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING