llamaIndex - ✅(Solved) Fix [Bug]: CondensePlusContextChatEngine.astream_chat silently aborts generation and yields 'Empty Response' when Retriever returns 0 nodes [7 pull requests, 8 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20894Fetched 2026-04-08 00:30:23
View on GitHub
Comments
8
Participants
4
Timeline
25
Reactions
0
Timeline (top)
commented ×8cross-referenced ×5referenced ×4mentioned ×3

Fix Action

Fix / Workaround

print(f"Final Response: '{full_text}'") # Expected output: "Hello! Yes, I can help you..." # Actual output: "Empty Response" (Instantly, no LLM call dispatched)

Execution HALTS in less than 1.2s without dispatching any POST request to /api/chat

PR fix notes

PR #20919: fix: call LLM directly when retriever returns 0 nodes instead of Empty Response

Description (problem / solution / changelog)

Description

Fixes #20894.

When CondensePlusContextChatEngine's retriever returns 0 nodes (e.g., empty vector store, strict metadata filters), BaseSynthesizer.synthesize() / asynthesize() short-circuits with a hardcoded "Empty Response" string without ever calling the LLM. This is problematic for chat engines where the LLM should still respond conversationally using its base knowledge and the system prompt.

This is especially painful in multi-tenant RAG systems where a new user with an empty vector space expects the AI to still interact conversationally.

Fix

When no context nodes are retrieved, CondensePlusContextChatEngine now bypasses the synthesizer and calls the LLM directly with the system prompt, chat history, and user message. This is handled consistently in all 4 chat methods:

  • chat() — calls self._llm.chat(messages)
  • stream_chat() — calls self._llm.stream_chat(messages)
  • achat() — calls self._llm.achat(messages)
  • astream_chat() — calls self._llm.astream_chat(messages)

A shared _build_llm_messages() helper constructs the message list (system prompt + chat history + user message) for all cases.

Why this approach

  • Targeted: Only changes the chat engine, not BaseSynthesizer (which may intentionally skip LLM calls for other use cases like pure QA)
  • Consistent with MultiModalCondensePlusContextChatEngine: That engine already calls the LLM directly and never goes through BaseSynthesizer, so it never had this bug
  • No new dependencies or parameters: No configuration needed — a chat engine should always produce a conversational response
  • Memory is still updated: The user and assistant messages are written to memory in all code paths

Before / After

ScenarioBeforeAfter
Retriever returns 0 nodes (sync)Returns "Empty Response" instantly, no LLM callLLM responds using base knowledge + system prompt
Retriever returns 0 nodes (streaming)Yields "Empty Response" token instantlyStreams LLM response normally
Retriever returns nodesNormal behavior (unchanged)Normal behavior (unchanged)

AI Disclosure: This PR was authored by Claude (AI), directed by @MaxwellCalkin.

Changed files

  • llama-index-core/llama_index/core/chat_engine/condense_plus_context.py (modified, +104/-6)

PR #20942: fix: : CondensePlusContextChatEngine.astream_chat silently aborts

Description (problem / solution / changelog)

Closes #20894

What: Pass an empty-text placeholder node to the synthesizer when the retriever returns 0 nodes, instead of letting it short-circuit with a hardcoded "Empty Response".

Why: Previously, CompactAndRefine.synthesize/asynthesize received an empty node list and skipped the LLM call entirely. Now, a NodeWithScore(node=TextNode(text="")) placeholder ensures the synthesizer still invokes the LLM with the full prompt template.

How: In condense_plus_context.py, all four chat paths (chat, stream_chat, achat, astream_chat) now gate context_nodes with synth_nodes = context_nodes or [NodeWithScore(node=TextNode(text=""))] before passing to the synthesizer. This is a minimal, localized fix — the retriever, prompt construction, and memory logic are untouched.

Added an EmptyRetriever helper and an empty_chat_engine fixture in test_condense_plus_context.py, with four new tests (test_chat_empty_nodes, test_stream_chat_empty_nodes, test_achat_empty_nodes, test_astream_chat_empty_nodes) that assert the response is not "Empty Response" and that chat history is correctly updated.

Description

See above.

New Package?

No.

Version Bump?

No — bug fix only, no API change.

Type of Change

Bug fix (non-breaking).

How Has This Been Tested?

Four new unit tests covering sync, streaming, async, and async-streaming paths with an EmptyRetriever that returns []. All assert the LLM is actually called and produces real output.

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have added tests that prove my fix is effective
  • New and existing unit tests pass locally with my changes
  • I have made corresponding changes to the documentation — N/A, no docs needed

Changed files

  • llama-index-core/llama_index/core/chat_engine/condense_plus_context.py (modified, +9/-5)
  • llama-index-core/tests/chat_engine/test_condense_plus_context.py (modified, +60/-1)

PR #20967: fix: call LLM with empty context instead of returning 'Empty Response' when no nodes retrieved

Description (problem / solution / changelog)

Summary

  • Fixes CondensePlusContextChatEngine silently returning "Empty Response" when the retriever returns 0 nodes, instead of calling the LLM
  • When no nodes are retrieved (e.g., empty vector store, metadata filters yielding no results), a single node with empty text is passed to the response synthesizer so the LLM is still invoked and can answer using its system prompt and training knowledge
  • The original empty node list is preserved in source_nodes so callers can still detect that no documents were retrieved

Fixes #20894

Test plan

  • Create a CondensePlusContextChatEngine with an empty VectorStoreIndex and verify chat() returns a real LLM response instead of "Empty Response"
  • Verify achat(), stream_chat(), and astream_chat() also return real LLM responses with 0 retrieved nodes
  • Verify that source_nodes in the response is an empty list (not containing the placeholder node)
  • Verify normal behavior (nodes retrieved) is unchanged
  • Verify context_source.raw_output still contains the original empty node list

🤖 Generated with Claude Code

Changed files

  • llama-index-core/llama_index/core/chat_engine/condense_plus_context.py (modified, +35/-13)

PR #20970: fix: CondensePlusContextChatEngine returns Empty Response when retriever yields 0 nodes

Description (problem / solution / changelog)

Description

When CondensePlusContextChatEngine is used with a retriever that returns 0 nodes (e.g. empty index, strict metadata filters in multi-tenant setups), BaseSynthesizer.synthesize/asynthesize short-circuits with a hardcoded "Empty Response" string without ever calling the LLM. This silently breaks conversational UX — the user gets an instant static string instead of a real LLM response.

Fixes #20894

What

In _run_c3 and _arun_c3, when context_nodes is empty, substitute a placeholder NodeWithScore(node=TextNode(text=""), score=0.0) so the synthesizer still invokes the LLM with the full prompt template, system prompt, and chat history. The context_source and externally-visible source_nodes remain truthful (empty list), while only the internal synth_nodes passed to the synthesizer gets the placeholder.

This is a minimal, localized fix — the retriever, prompt construction, memory logic, and BaseSynthesizer are all untouched.

New Package?

  • No

Version Bump?

  • No — bug fix only, no API change.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Added EmptyRetriever helper and empty_chat_engine fixture in test_condense_plus_context.py, with four new tests covering all chat paths:

  • test_chat_empty_nodes
  • test_stream_chat_empty_nodes
  • test_achat_empty_nodes
  • test_astream_chat_empty_nodes

All assert the response is not "Empty Response" and that chat history is correctly updated with 2 entries (user + assistant).

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have added tests that prove my fix is effective
  • New and existing unit tests pass locally with my changes
  • I have made corresponding changes to the documentation — N/A

Changed files

  • llama-index-core/llama_index/core/chat_engine/condense_plus_context.py (modified, +35/-21)
  • llama-index-core/tests/chat_engine/test_condense_plus_context.py (modified, +108/-1)

PR #21040: fix(chat_engine): call LLM directly when retriever returns 0 nodes in CondensePlusContextChatEngine

Description (problem / solution / changelog)

Summary

Fixes #20894

Problem

When CondensePlusContextChatEngine's retriever returns 0 nodes (e.g., empty vector store, metadata filters that match nothing), the CompactAndRefine synthesizer immediately returns a hardcoded "Empty Response" without ever calling the LLM. This silently breaks production RAG systems — especially multi-tenant architectures — where the LLM should still respond using its baseline knowledge and the system prompt.

Root Cause

BaseSynthesizer.synthesize() and BaseSynthesizer.asynthesize() have an early return when len(nodes) == 0, returning "Empty Response" without dispatching any LLM API call. The chat engine passes context nodes directly to the synthesizer without checking for this edge case.

Fix

Added a fallback path in all four chat methods (chat, stream_chat, achat, astream_chat): when context_nodes is empty, the engine now calls the LLM directly with the system prompt, chat history, and user message via self._llm.chat() / self._llm.stream_chat() / self._llm.achat() / self._llm.astream_chat().

This preserves the existing behavior when context nodes are found, and only activates the fallback when the retriever returns nothing.

Changes

  • Added _build_fallback_messages() helper that constructs the message list from system prompt + chat history + user message
  • Modified chat() — calls self._llm.chat() directly when no context nodes
  • Modified stream_chat() — calls self._llm.stream_chat() directly when no context nodes
  • Modified achat() — calls self._llm.achat() directly when no context nodes
  • Modified astream_chat() — calls self._llm.astream_chat() directly when no context nodes

Before / After

Before: Retriever returns 0 nodes → synthesizer returns "Empty Response" in <1 second, no LLM call
After: Retriever returns 0 nodes → LLM is called with system prompt + chat history → real response generated

Changed files

  • llama-index-core/llama_index/core/chat_engine/condense_plus_context.py (modified, +126/-44)

PR #21047: fix: call LLM with empty context when retriever returns 0 nodes (#20894)

Description (problem / solution / changelog)

Summary

Fixes #20894.

Root cause: BaseSynthesizer.synthesize/asynthesize short-circuited when len(nodes) == 0, returning hardcoded Empty Response without invoking the LLM.

Fix: Add use_llm_when_empty parameter (default True) to BaseSynthesizer; when nodes is empty, pass text_chunks=[''] to get_response so the LLM is called with empty context.

Changes

  • base.py: Add use_llm_when_empty param; call get_response with [""] when nodes empty
  • test_condense_plus_context.py: Add regression tests for chat and astream_chat with empty index

Testing

  • Added 2 tests covering chat and astream_chat with empty retriever
  • All existing tests pass

Made with Cursor

Changed files

  • llama-index-core/llama_index/core/response_synthesizers/base.py (modified, +25/-17)
  • llama-index-core/tests/chat_engine/test_condense_plus_context.py (modified, +31/-1)

PR #21206: fix: add opt-in fallback_to_llm param for empty retrieval in CondensePlusContextChatEngine

Description (problem / solution / changelog)

Summary

  • Adds an opt-in fallback_to_llm parameter (default False) to CondensePlusContextChatEngine
  • When enabled and the retriever returns 0 nodes, the engine calls the LLM directly with the system prompt, chat history, and user message instead of short-circuiting with "Empty Response"
  • All four chat methods are covered: chat, stream_chat, achat, astream_chat
  • Default behavior is unchanged -- existing users are not affected

Problem

In multi-tenant RAG systems with metadata filters, queries often return 0 nodes (e.g., new users with empty vector spaces). The current behavior silently returns a hardcoded "Empty Response" instead of letting the LLM attempt an answer using its baseline knowledge and the system prompt. See #20894.

Why this approach

Per maintainer feedback, the fix must be opt-in, not a default behavior change. This PR:

  • Scopes the change to CondensePlusContextChatEngine only (does not modify BaseSynthesizer)
  • Defaults to False so query engines that rely on the empty-response behavior are unaffected
  • Provides a simple API: CondensePlusContextChatEngine.from_defaults(..., fallback_to_llm=True)

Usage

engine = CondensePlusContextChatEngine.from_defaults(
    retriever=retriever,
    llm=llm,
    system_prompt="You are a helpful AI.",
    fallback_to_llm=True,  # <-- opt in
)

Test plan

  • Query with 0 retrieval nodes and fallback_to_llm=True gets an LLM response
  • Query with 0 retrieval nodes and fallback_to_llm=False (default) still returns "Empty Response"
  • Query with matching nodes works normally regardless of fallback_to_llm setting
  • Streaming (astream_chat, stream_chat) works correctly with 0 nodes and fallback enabled

Fixes #20894

Changed files

  • llama-index-core/llama_index/core/chat_engine/condense_plus_context.py (modified, +125/-44)

Code Example

INFO:src.agent:📂 Initializing local search tool (RAG)...
INFO:src.agent:RAG tool initialized successfully
INFO:src.engine_builder:⚙️  Configuring Hybrid Search (Vector + BM25)...
INFO:src.engine_builder:   🔍 Tenant Filter applied: Jeferson
WARNING:src.engine_builder:   ⚠️  No documents found in ChromaDB for BM25.
INFO:src.engine_builder:   🧠 Engine initialized via OLLAMA with model qwen2.5:0.5b

[STARTING ASTREAM_CHAT TRACE]

INFO:llama_index.core.chat_engine.condense_plus_context:Condensed question: Hello Sovereign! This is a test being sent directly from N8N via Webhook in the Cybrid network.

# Execution HALTS in less than 1.2s without dispatching any POST request to /api/chat

==== DEBUG RESPONSE ====
TYPE: <class 'llama_index.core.chat_engine.types.StreamingAgentChatResponse'>
FULL TEXT FINAL: 'Empty Response'
chat_stream.response: 'Empty Response'
RAW_BUFFERClick to expand / collapse

Bug Description

When using CondensePlusContextChatEngine with an asynchronous stream (astream_chat), if the provided Retriever (e.g., QueryFusionRetriever or VectorIndexRetriever) returns 0 nodes (which can happen frequently with valid metadata filters like tenant IDs), the Chat Engine silently aborts the LLM generation process.

Instead of passing the system prompt and the user query to the LLM with an empty context, the synthesizer (called via _arun_c3 -> asynthesize) evaluates to an empty node list and completely skips the LLM API call. It instantly returns a hardcoded "Empty Response" string wrapped in an AsyncStreamingResponse.

This is highly problematic for production RAG systems (like multi-tenant architectures), where a user might ask a general question or have an empty vector space on their first day. Instead of the LLM answering naturally leveraging its baseline knowledge and the System Prompt, the application receives a silent "Empty Response" in less than 1 second, with no exceptions raised, masking the behavior.

llamaindex_bug_report.md

Version

Python 3.12.x LlamaIndex v0.10.x LLM Provider: Agnostic (Tested with Ollama, but reproducible via OpenAI due to synthesizer logic)

Steps to Reproduce

import asyncio from llama_index.core import Document, VectorStoreIndex from llama_index.core.chat_engine import CondensePlusContextChatEngine from llama_index.llms.ollama import Ollama # or OpenAI

async def main(): # 1. Create an empty index (or one where filters will yield 0 nodes) index = VectorStoreIndex.from_documents([]) retriever = index.as_retriever(similarity_top_k=2)

# 2. Setup LLM
llm = Ollama(model="qwen2.5:0.5b") # Ensure model is running locally

# 3. Build CondensePlusContextChatEngine
engine = CondensePlusContextChatEngine.from_defaults(
    retriever=retriever,
    llm=llm,
    system_prompt="You are a helpful AI."
)

# 4. Attempt to trigger Async Stream
print("Sending query...")
chat_stream = await engine.astream_chat("Hello, can you help me?")

full_text = ""
async for token in chat_stream.async_response_gen():
    full_text += token
    
print(f"Final Response: '{full_text}'") 
# Expected output: "Hello! Yes, I can help you..."
# Actual output: "Empty Response" (Instantly, no LLM call dispatched)

if name == "main": asyncio.run(main())

Relevant Logs/Tracbacks

INFO:src.agent:📂 Initializing local search tool (RAG)...
INFO:src.agent:   ✓ RAG tool initialized successfully
INFO:src.engine_builder:⚙️  Configuring Hybrid Search (Vector + BM25)...
INFO:src.engine_builder:   🔍 Tenant Filter applied: Jeferson
WARNING:src.engine_builder:   ⚠️  No documents found in ChromaDB for BM25.
INFO:src.engine_builder:   🧠 Engine initialized via OLLAMA with model qwen2.5:0.5b

[STARTING ASTREAM_CHAT TRACE]

INFO:llama_index.core.chat_engine.condense_plus_context:Condensed question: Hello Sovereign! This is a test being sent directly from N8N via Webhook in the Cybrid network.

# Execution HALTS in less than 1.2s without dispatching any POST request to /api/chat

==== DEBUG RESPONSE ====
TYPE: <class 'llama_index.core.chat_engine.types.StreamingAgentChatResponse'>
FULL TEXT FINAL: 'Empty Response'
chat_stream.response: 'Empty Response'

extent analysis

Fix Plan

1. Update CondensePlusContextChatEngine to handle empty node lists

Code Change:

# llama_index/core/chat_engine.py
class CondensePlusContextChatEngine:
    # ...

    async def _arun_c3(self, *args, **kwargs):
        # ...

        # Check if node list is empty
        if not nodes:
            # Instead of returning an empty response, dispatch the LLM API call with an empty context
            return await self._dispatch_llm_api_call(context="")

    async def _dispatch_llm_api_call(self, context):
        # Implement LLM API call dispatching logic here
        # ...

2. Update astream_chat to handle empty responses from the LLM API

Code Change:

# llama_index/core/chat_engine.py
class CondensePlusContextChatEngine:
    # ...

    async def astream_chat(self, query):
        # ...

        async for token in chat_stream.async_response_gen():
            full_text += token

        # Check if response is empty
        if full_text == "Empty Response":
            # Dispatch the LLM API call again with the original context
            await self._dispatch_llm_api_call(context=self._get_context(query))
            # Append the new response to the chat stream
            async for token in chat_stream.async_response_gen():
                full_text += token

3. Update VectorStoreIndex to handle empty filters

Code Change:

# llama_index/core/index.py
class VectorStoreIndex:
    # ...

    def as_retriever(self, similarity_top_k=2):
        # ...

        # Check if filter yields 0 nodes
        if not self._filter_nodes:
            # Return an empty retriever
            return EmptyRetriever()

EmptyRetriever class:

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING