llamaIndex - ✅(Solved) Fix [Bug]: `SubQuestionQueryEngine` partial-failure handling breaks on non-ValueError exceptions [3 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20904Fetched 2026-04-08 00:30:18
View on GitHub
Comments
2
Participants
2
Timeline
10
Reactions
0
Timeline (top)
cross-referenced ×3commented ×2labeled ×2referenced ×2

Error Message

RuntimeError Traceback (most recent call last)

Fix Action

Fix / Workaround

8 frames/usr/local/lib/python3.12/dist-packages/llama_index_instrumentation/dispatcher.py in wrapper(func, instance, args, kwargs) 411 412 try: --> 413 result = func(*args, **kwargs) 414 if isinstance(result, asyncio.Future): 415 # If the result is a Future, wrap it

/usr/local/lib/python3.12/dist-packages/llama_index/core/base/base_query_engine.py in query(self, str_or_query_bundle) 42 if isinstance(str_or_query_bundle, str): 43 str_or_query_bundle = QueryBundle(str_or_query_bundle) ---> 44 query_result = self._query(str_or_query_bundle) 45 dispatcher.event( 46 QueryEndEvent(query=str_or_query_bundle, response=query_result)

/usr/local/lib/python3.12/dist-packages/llama_index_instrumentation/dispatcher.py in wrapper(func, instance, args, kwargs) 411 412 try: --> 413 result = func(*args, **kwargs) 414 if isinstance(result, asyncio.Future): 415 # If the result is a Future, wrap it

PR fix notes

PR #20905: fix(core): partial-failure handling in SubQuestionQueryEngine

Description (problem / solution / changelog)

Description

Fixes #20904

Broadened to except Exception with exc_info=True so failed sub-questions are logged and skipped, allowing the remaining results to be synthesized. Also added tests covering all three execution paths (sync, sync with use_async=True, and fully async) to make sure that a failing sub-question is skipped and does not crash the overall query.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py (modified, +10/-4)
  • llama-index-core/tests/query_engine/test_sub_question_query_engine.py (added, +106/-0)

PR #20913: fix: catch all exceptions in SubQuestionQueryEngine sub-question execution

Description (problem / solution / changelog)

Description

_query_subq and _aquery_subq in SubQuestionQueryEngine only caught ValueError, but sub-question execution can raise various exceptions (provider API errors, transport errors, timeouts, KeyError for invalid tool names, etc.). These uncaught exceptions caused the entire query to fail instead of gracefully skipping the failed sub-question — defeating the partial-failure tolerance that the class is designed for via filter(None, qa_pairs_all).

Widen the catch to Exception and include error details in the warning log message.

Fixes #20904

New Package?

  • Yes
  • No

Version Bump?

  • Yes
  • No

Changed files

  • llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py (modified, +4/-4)

PR #20921: fix(query_engine): broaden exception handling in SubQuestionQueryEngine to catch all runtime errors

Description (problem / solution / changelog)

Summary

SubQuestionQueryEngine is designed to tolerate partial sub-question failures — failed sub-questions return None and are filtered via filter(None, qa_pairs_all) before response synthesis. However, both _query_subq and _aquery_subq only caught ValueError, so any other exception caused the entire query to fail.

Root Cause

# Before (both methods)
except ValueError:
    logger.warning(f"[{sub_q.tool_name}] Failed to run {question}")
    return None

Real-world sub-engine failures are rarely ValueError:

  • Provider API rate limits → RuntimeError
  • Network failures → ConnectionError / TimeoutError
  • Invalid tool name → KeyError

All of these escaped the narrow except ValueError clause.

Fix

# After (both methods)
except Exception:
    logger.warning(f"[{sub_q.tool_name}] Failed to run {question}")
    return None

The constructor fall-back that catches ValueError when trying to instantiate the OpenAI question generator (line 116) is deliberately left unchanged — that is intentional degradation, not an error.

Tests

Added llama-index-core/tests/query_engine/test_sub_question_query_engine.py:

  • test_query_subq_tolerates_non_value_error — parametrized over RuntimeError, KeyError, TimeoutError, ConnectionError
  • test_aquery_subq_tolerates_non_value_error — async variant
  • test_batch_query_skips_failed_sub_question — integration-style test showing the full query succeeds with one failing sub-engine

Fixes #20904

Changed files

  • llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py (modified, +2/-2)
  • llama-index-core/tests/query_engine/test_sub_question_query_engine.py (added, +151/-0)

Code Example

### Relevant Logs/Tracbacks
RAW_BUFFERClick to expand / collapse

Bug Description

Both _query_subq and _aquery_subq in SubQuestionQueryEngine only catch ValueError, even though the class is explicitly designed to tolerate partial sub-question failures via filter(None, qa_pairs_all) . So, common runtime exceptions from sub-query execution, such as provider API errors, transport errors, timeouts, or a KeyError from an invalid tool name, escape uncaught and cause the entire query to fail instead of skipping the failed sub-question and continuing with the remaining results.

Version

0.14.15

Steps to Reproduce


  from unittest.mock import MagicMock
  from llama_index.core import VectorStoreIndex, Settings
  from llama_index.core.base.base_query_engine import BaseQueryEngine
  from llama_index.core.base.response.schema import RESPONSE_TYPE
  from llama_index.core.callbacks import CallbackManager
  from llama_index.core.question_gen.types import SubQuestion
  from llama_index.core.query_engine.sub_question_query_engine import SubQuestionQueryEngine
  from llama_index.core.response_synthesizers import get_response_synthesizer
  from llama_index.core.schema import Document, QueryBundle
  from llama_index.core.tools import QueryEngineTool, ToolMetadata
  from llama_index.llms.openai import OpenAI
  from llama_index.embeddings.openai import OpenAIEmbedding

  Settings.llm = OpenAI(model="gpt-5")
  Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")


  class RateLimitedQueryEngine(BaseQueryEngine):
      def __init__(self):
          super().__init__(callback_manager=CallbackManager([]))
      def _query(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
          raise RuntimeError("API rate limit exceeded")
      async def _aquery(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
          raise RuntimeError("API rate limit exceeded")
      def _get_prompt_modules(self):
          return {}

  index = VectorStoreIndex.from_documents([Document(text="Paris is the capital of France.")])

  tools = [
      QueryEngineTool(
          query_engine=index.as_query_engine(),
          metadata=ToolMetadata(name="france_docs", description="Facts about France"),
      ),
      QueryEngineTool(
          query_engine=RateLimitedQueryEngine(),
          metadata=ToolMetadata(name="germany_docs", description="Facts about Germany"),
      ),
  ]

  question_gen = MagicMock()
  question_gen.generate.return_value = [
      SubQuestion(sub_question="What is the capital of France?", tool_name="france_docs"),
      SubQuestion(sub_question="What is the capital of Germany?", tool_name="germany_docs"),
  ]

  engine = SubQuestionQueryEngine(
      question_gen=question_gen,
      response_synthesizer=get_response_synthesizer(),
      query_engine_tools=tools,
      use_async=False,
  )
  response = engine.query("What are the capitals of France and Germany?")
  print(response)

Relevant Logs/Tracbacks

Generated 2 sub questions.
[france_docs] Q: What is the capital of France?
[france_docs] A: Paris
[germany_docs] Q: What is the capital of Germany?
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_1122/1314053154.py in <cell line: 0>()
     51     use_async=False,
     52 )
---> 53 response = engine.query("What are the capitals of France and Germany?")
     54 print(response)

8 frames/usr/local/lib/python3.12/dist-packages/llama_index_instrumentation/dispatcher.py in wrapper(func, instance, args, kwargs)
    411 
    412             try:
--> 413                 result = func(*args, **kwargs)
    414                 if isinstance(result, asyncio.Future):
    415                     # If the result is a Future, wrap it

/usr/local/lib/python3.12/dist-packages/llama_index/core/base/base_query_engine.py in query(self, str_or_query_bundle)
     42             if isinstance(str_or_query_bundle, str):
     43                 str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 44             query_result = self._query(str_or_query_bundle)
     45         dispatcher.event(
     46             QueryEndEvent(query=str_or_query_bundle, response=query_result)

/usr/local/lib/python3.12/dist-packages/llama_index_instrumentation/dispatcher.py in wrapper(func, instance, args, kwargs)
    411 
    412             try:
--> 413                 result = func(*args, **kwargs)
    414                 if isinstance(result, asyncio.Future):
    415                     # If the result is a Future, wrap it

/usr/local/lib/python3.12/dist-packages/llama_index/core/query_engine/sub_question_query_engine.py in _query(self, query_bundle)
    153             else:
    154                 qa_pairs_all = [
--> 155                     self._query_subq(sub_q, color=colors[str(ind)])
    156                     for ind, sub_q in enumerate(sub_questions)
    157                 ]

/usr/local/lib/python3.12/dist-packages/llama_index/core/query_engine/sub_question_query_engine.py in _query_subq(self, sub_q, color)
    261                     print_text(f"[{sub_q.tool_name}] Q: {question}\n", color=color)
    262 
--> 263                 response = query_engine.query(question)
    264                 response_text = str(response)
    265 

/usr/local/lib/python3.12/dist-packages/llama_index_instrumentation/dispatcher.py in wrapper(func, instance, args, kwargs)
    411 
    412             try:
--> 413                 result = func(*args, **kwargs)
    414                 if isinstance(result, asyncio.Future):
    415                     # If the result is a Future, wrap it

/usr/local/lib/python3.12/dist-packages/llama_index/core/base/base_query_engine.py in query(self, str_or_query_bundle)
     42             if isinstance(str_or_query_bundle, str):
     43                 str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 44             query_result = self._query(str_or_query_bundle)
     45         dispatcher.event(
     46             QueryEndEvent(query=str_or_query_bundle, response=query_result)

/usr/local/lib/python3.12/dist-packages/llama_index_instrumentation/dispatcher.py in wrapper(func, instance, args, kwargs)
    411 
    412             try:
--> 413                 result = func(*args, **kwargs)
    414                 if isinstance(result, asyncio.Future):
    415                     # If the result is a Future, wrap it

/tmp/ipykernel_1122/1314053154.py in _query(self, query_bundle)
     20         super().__init__(callback_manager=CallbackManager([]))
     21     def _query(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
---> 22         raise RuntimeError("API rate limit exceeded")
     23     async def _aquery(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
     24         raise RuntimeError("API rate limit exceeded")

RuntimeError: API rate limit exceeded

extent analysis

Fix Plan

1. Catch Specific Exceptions

Update the SubQuestionQueryEngine class to catch specific exceptions in _query_subq and _aquery_subq methods.

class SubQuestionQueryEngine:
    # ...

    def _query_subq(self, sub_q, color):
        try:
            response = query_engine.query(question)
            # ...
        except (RuntimeError, TimeoutError, KeyError) as e:
            # Log the exception and skip the failed sub-question
            print(f"Skipping failed sub-question: {e}")
            return None

    async def _aquery_subq(self, sub_q, color):
        try:
            response = await query_engine._aquery(question)
            # ...
        except (RuntimeError, TimeoutError, KeyError) as e:
            # Log the exception and skip the failed sub-question
            print(f"Skipping failed sub-question: {e}")
            return None

2. Filter Out Failed Sub-Questions

Update the SubQuestionQueryEngine class to filter out failed sub-questions in the _query method.

class SubQuestionQueryEngine:
    # ...

    def query(self, str_or_query_bundle):
        # ...
        qa_pairs_all = [
            self._query_subq(sub_q, color=colors[str(ind)])
            for ind, sub_q in enumerate(sub_questions)
            if self._query_subq(sub_q, color=colors[str(ind)]) is not None
        ]
        # ...

3. Update Test Code

Update the test code to handle the new behavior.

def test_sub_question_query_engine():
    # ...
    question_gen.generate.return_value = [
        SubQuestion(sub_question="What is the capital of France?", tool_name="france_docs"),
        SubQuestion(sub_question="What is the capital of Germany?", tool_name="germany_docs"),
    ]
    engine

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: `SubQuestionQueryEngine` partial-failure handling breaks on non-ValueError exceptions [3 pull requests, 2 comments, 2 participants]