llamaIndex - ✅(Solved) Fix [Bug]: OpenAILike / FunctionAgent: Kimi-K2.5 sometimes returns final answer in reasoning_content [1 pull requests, 2 comments, 2 participants]

liebki · 2026-04-08T12:54:42Z

[llamaIndex] PR 21345: fix FunctionAgent : fall back to ThinkingBlock content when response content is empty - Repository: run-llama/llama index - Author: octo… # PR #21345: fix(FunctionAgent): fall back to ThinkingBlock content when response content is empty - Repository: run-llama/llama_index - Author: octo-patch - State: open | merged: False - Link: https://github.com/run-llama/llama_index/pull/21345 ## Description (problem / solution / changelog) Fixes #21337 ## Problem Some OpenAI-compatible models (e.g. Kimi-K2.5) occasionally return the final answer in `reasoning_content` instead of `content`. Because `FunctionAgent` doesn't validate that the response content is non-empty (unlike `ReActAgent`, which raises `ValueError("Got empty message")`), it silently returns an empty answer even though the model produced a valid response. The streaming code in the OpenAI LLM already captures `reasoning_content` and stores it in a `ThinkingBlock` inside `ChatMessage.blocks`. However, `ChatMessage.content` only aggregates `TextBlock` text, so the `ThinkingBlock` content is invisible to the caller. ## Solution In `FunctionAgent.take_step`, after extracting tool calls, add a fallback: when there are **no tool calls** and **content is empty**, scan the response blocks for a `ThinkingBlock` with non-empty content and, if found, reconstruct the `ChatResponse` using that content as the text response. This handles both streaming and non-streaming paths (the check is in `take_step`, which is called for both). Conditions for the fallback to trigger — all must be true: - No tool calls in the response - `message.content` is empty / `None` - At least one `ThinkingBlock` in `message.blocks` has non-empty content This does not change behaviour for models that behave correctly. ## Testing The fix can be validated by mocking an LLM that returns a `ChatMessage` with only a `ThinkingBlock` (no `TextBlock` content) and verifying that `FunctionAgent` returns the thinking content as its response rather than an empty string. Existing tests in `tests/agent/workflow/test_single_agent_workflow.py` continue to pass unchanged. ## Changed files - `llama-index-core/llama_index/core/agent/workflow/function_agent.py` (modified, +26/-1) ## Fix / Workaround ### My workaround I added a small custom `OpenAILike` wrapper: - if the final stream returns no chunks, I fall back to a non-streaming call - if all of these are true: - `role = assistant` - `finish_reason = stop` - no tool calls are present - `content` is empty - `reasoning_content` exists ### Bug Description Hello again 😄 I think I may have found a small compatibility issue with **Kimi-K2.5** on an **OpenAI-compatible endpoint** (behind **LiteLLM 1.78.5** and **Kong 3.9.1**). ### Problem In some **tool-calling runs** with **FunctionAgent**, the final answer does not come back in the normal `content` field. Instead, I sometimes see this behavior: - `content` is empty / `None` - no final stream chunks arrive - but the real final answer is present in `reasoning_content` Because of this, the agent can sometimes finish with an empty result, even though the model actually produced a valid final answer. ### Important note I have only seen this issue with **FunctionAgent**. I did **not** have this problem before when using **ReActAgent**, so maybe this detail helps narrow it down. ### My workaround I added a small custom `OpenAILike` wrapper: - if the final stream returns no chunks, I fall back to a non-streaming call - if all of these are true: - `role = assistant` - `finish_reason = stop` - no tool calls are present - `content` is empty - `reasoning_content` exists then I copy `reasoning_content` into `content`. This fixes the problem for me and allows the final answer to be shown correctly (it's not reasoning text, it's the answer of the model that never got out of `reasoning_content`). ### Question Would this be something that could be handled by: - an official wrapper for this provider/model - or a small normalization option in `OpenAILike` Thank you very much! ### Version llama-index==0.12.52 ### Steps to Reproduce Kimi-K2.5 + OpenAI-compatible endpoint + tool calling + final answer step sometimes (every 2-3rd answer) returns the final answer in reasoning_content instead of content, or the stream returns no final chunks. Kimi’s tool-calling flow is supposed to continue after tool results until the model can answer normally, and K2.5 also has a documented reasoning_content field in thinking-mode style responses. ### Relevant Logs/Tracebacks ```shell ```

llamaIndex2026-04-08 12:54:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#21337•Fetched 2026-04-09 07:51:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

liebki

Participants

dosubot[bot]

liebki

Timeline (top)

commented ×2labeled ×2mentioned ×2subscribed ×2

Root Cause

Because of this, the agent can sometimes finish with an empty result, even though the model actually produced a valid final answer.

Fix Action

Fix / Workaround

My workaround

I added a small custom OpenAILike wrapper:

if the final stream returns no chunks, I fall back to a non-streaming call
if all of these are true:
- role = assistant
- finish_reason = stop
- no tool calls are present
- content is empty
- reasoning_content exists

PR fix notes

PR #21345: fix(FunctionAgent): fall back to ThinkingBlock content when response content is empty

Repository: run-llama/llama_index
Author: octo-patch
State: open | merged: False
Link: https://github.com/run-llama/llama_index/pull/21345

Description (problem / solution / changelog)

Fixes #21337

Problem

Some OpenAI-compatible models (e.g. Kimi-K2.5) occasionally return the final answer in reasoning_content instead of content. Because FunctionAgent doesn't validate that the response content is non-empty (unlike ReActAgent, which raises ValueError("Got empty message")), it silently returns an empty answer even though the model produced a valid response.

The streaming code in the OpenAI LLM already captures reasoning_content and stores it in a ThinkingBlock inside ChatMessage.blocks. However, ChatMessage.content only aggregates TextBlock text, so the ThinkingBlock content is invisible to the caller.

Solution

In FunctionAgent.take_step, after extracting tool calls, add a fallback: when there are no tool calls and content is empty, scan the response blocks for a ThinkingBlock with non-empty content and, if found, reconstruct the ChatResponse using that content as the text response.

This handles both streaming and non-streaming paths (the check is in take_step, which is called for both).

Conditions for the fallback to trigger — all must be true:

No tool calls in the response
message.content is empty / None
At least one ThinkingBlock in message.blocks has non-empty content

This does not change behaviour for models that behave correctly.

Testing

The fix can be validated by mocking an LLM that returns a ChatMessage with only a ThinkingBlock (no TextBlock content) and verifying that FunctionAgent returns the thinking content as its response rather than an empty string.

Existing tests in tests/agent/workflow/test_single_agent_workflow.py continue to pass unchanged.

Changed files

llama-index-core/llama_index/core/agent/workflow/function_agent.py (modified, +26/-1)

RAW_BUFFERClick to expand / collapse

Bug Description

Hello again 😄

I think I may have found a small compatibility issue with Kimi-K2.5 on an OpenAI-compatible endpoint (behind LiteLLM 1.78.5 and Kong 3.9.1).

Problem

In some tool-calling runs with FunctionAgent, the final answer does not come back in the normal content field.

Instead, I sometimes see this behavior:

content is empty / None
no final stream chunks arrive
but the real final answer is present in reasoning_content

Because of this, the agent can sometimes finish with an empty result, even though the model actually produced a valid final answer.

Important note

I have only seen this issue with FunctionAgent.

I did not have this problem before when using ReActAgent, so maybe this detail helps narrow it down.

My workaround

I added a small custom OpenAILike wrapper:

if the final stream returns no chunks, I fall back to a non-streaming call
if all of these are true:
- role = assistant
- finish_reason = stop
- no tool calls are present
- content is empty
- reasoning_content exists

then I copy reasoning_content into content.

This fixes the problem for me and allows the final answer to be shown correctly (it's not reasoning text, it's the answer of the model that never got out of reasoning_content).

Question

Would this be something that could be handled by:

an official wrapper for this provider/model
or a small normalization option in OpenAILike

Thank you very much!

Version

llama-index==0.12.52

Steps to Reproduce

Kimi-K2.5 + OpenAI-compatible endpoint + tool calling + final answer step sometimes (every 2-3rd answer) returns the final answer in reasoning_content instead of content, or the stream returns no final chunks. Kimi’s tool-calling flow is supposed to continue after tool results until the model can answer normally, and K2.5 also has a documented reasoning_content field in thinking-mode style responses.

Relevant Logs/Tracebacks

extent analysis

TL;DR

Implement a custom wrapper or normalization option in OpenAILike to handle cases where the final answer is returned in reasoning_content instead of content.

Guidance

Verify that the issue only occurs with FunctionAgent and not with other agents like ReActAgent to confirm the scope of the problem.
Consider implementing a fallback mechanism to non-streaming calls when the final stream returns no chunks, as described in the user's workaround.
Investigate adding a normalization option in OpenAILike to handle cases where content is empty and reasoning_content exists, and copy reasoning_content into content when certain conditions are met (e.g., role = assistant, finish_reason = stop, no tool calls present).
Test the custom wrapper or normalization option with different scenarios to ensure it fixes the issue and does not introduce new problems.

Example

class CustomOpenAILike:
    def __init__(self, ...):
        # ...

    def get_final_answer(self, response):
        if response['content'] is None or response['content'] == '':
            if response['reasoning_content'] and response['role'] == 'assistant' and response['finish_reason'] == 'stop':
                response['content'] = response['reasoning_content']
        return response['content']

Notes

The issue seems to be specific to FunctionAgent and Kimi-K2.5 with an OpenAI-compatible endpoint, and the user's workaround provides a potential solution. However, further testing and verification are needed to ensure the custom wrapper or normalization option works correctly in all scenarios.

Recommendation

Apply the workaround by implementing a custom wrapper or normalization option in OpenAILike, as it provides a targeted solution to the issue and does not require upgrading to a different version.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#retriever error #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

llamaIndex - ✅(Solved) Fix [Bug]: OpenAILike / FunctionAgent: Kimi-K2.5 sometimes returns final answer in reasoning_content [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

My workaround

PR fix notes

PR #21345: fix(FunctionAgent): fall back to ThinkingBlock content when response content is empty

Description (problem / solution / changelog)

Problem

Solution

Testing

Changed files

Bug Description

Problem

Important note

My workaround

Question

Version

Steps to Reproduce

Relevant Logs/Tracebacks

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING