llamaIndex - ✅(Solved) Fix [Feature Request]: General Support for Multimodal Synthesis [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21373Fetched 2026-04-15 06:20:03
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×2cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #21374: Multimodal synthesis

Description (problem / solution / changelog)

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

First of two or three PRs to broadly support multimodal synthesis. This PR:

  1. Adds a BaseMultimodalSynthesizer class to address existing semantic issues with variable naming and logical issues with conversion of nodes to multimodal content.
  2. Creates multimodal prompts for relevant synthesizers
  3. Adds multimodal synthesizers for ContextOnly, Generation, NoText, SimpleSummarize (relatively basic synthesizers), as well as Refine and CompactAndRefine.
  4. Support for streaming StructuredResponse objects was also added to the Refine synthesizer.
  5. Adds lacking test support for response synthesizers

Fixes # 21373

Although the number of lines is large in this PR, the total logical changes are not that great. Fairly repetitive across the basic MultimodalSynthesizers. Since the Refine synthesizer contained more complicated updates, I will follow up with a second PR for the remaining synthesizers so focus can be given to the updates there. Also, many lines were added because there was little to no testing of the synthesizer classes. Some suggestions are made in PR comments on how to reduce total bloat here. Unfortunately though, because of some logical and semantic issues with the BaseSynthesizer class, it seemed like a better idea to make a new Multimodal synthesizer class so as to not introduce breaking changes or overly complicated logic/function signatures in the BaseSynthesizer class.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-core/llama_index/core/indices/prompt_helper.py (modified, +4/-21)
  • llama-index-core/llama_index/core/instrumentation/events/synthesis.py (modified, +32/-0)
  • llama-index-core/llama_index/core/llms/mock.py (modified, +94/-18)
  • llama-index-core/llama_index/core/postprocessor/llm_rerank.py (modified, +2/-2)
  • llama-index-core/llama_index/core/program/llm_program.py (modified, +94/-1)
  • llama-index-core/llama_index/core/program/utils.py (modified, +7/-5)
  • llama-index-core/llama_index/core/prompts/base.py (modified, +1/-11)
  • llama-index-core/llama_index/core/prompts/chat_prompts.py (modified, +108/-1)
  • llama-index-core/llama_index/core/prompts/prompt_utils.py (modified, +3/-3)
  • llama-index-core/llama_index/core/response_synthesizers/base.py (modified, +203/-2)
  • llama-index-core/llama_index/core/response_synthesizers/compact_and_refine.py (modified, +58/-4)
  • llama-index-core/llama_index/core/response_synthesizers/context_only.py (modified, +43/-1)
  • llama-index-core/llama_index/core/response_synthesizers/generation.py (modified, +176/-2)
  • llama-index-core/llama_index/core/response_synthesizers/no_text.py (modified, +31/-1)
  • llama-index-core/llama_index/core/response_synthesizers/refine.py (modified, +626/-272)
  • llama-index-core/llama_index/core/response_synthesizers/simple_summarize.py (modified, +119/-2)
  • llama-index-core/llama_index/core/settings.py (modified, +44/-1)
  • llama-index-core/tests/chat_engine/test_condense_plus_context.py (modified, +7/-11)
  • llama-index-core/tests/chat_engine/test_context.py (modified, +9/-13)
  • llama-index-core/tests/indices/response/test_response_builder.py (modified, +2/-1)
  • llama-index-core/tests/response_synthesizers/test_compact_and_refine.py (added, +391/-0)
  • llama-index-core/tests/response_synthesizers/test_generate.py (modified, +172/-28)
  • llama-index-core/tests/response_synthesizers/test_refine.py (modified, +1199/-119)
  • llama-index-core/tests/response_synthesizers/test_simple_summarize.py (added, +220/-0)
RAW_BUFFERClick to expand / collapse

Feature Description

1.) Support multimodal inputs with chat prompts for synthesis prompts 2.) Support multimodal synthesis with a MultimodalSynthesizer class (existing class takes nodes for synthesize but text_chunks for get_response. While get_content_blocks for nodes now supports transforming nodes into multimodal content blocks, the API for get_response is semantically incompatible with multimodal processing, as well as the generic logic in the synthesize function).

Reason

No response

Value of Feature

No response

extent analysis

TL;DR

Update the get_response API and synthesize function to support multimodal processing by making their logic semantically compatible with multimodal content blocks.

Guidance

  • Review the existing MultimodalSynthesizer class and its methods (synthesize and get_response) to identify areas where the API and logic need to be adapted for multimodal inputs and synthesis.
  • Consider modifying the get_response method to accept nodes instead of text_chunks to align with the synthesize method and support multimodal content blocks.
  • Investigate how the get_content_blocks method can be utilized or extended to transform nodes into multimodal content blocks that can be processed by the updated get_response and synthesize methods.
  • Evaluate the generic logic in the synthesize function to ensure it can handle multimodal inputs and synthesis without requiring significant rework.

Example

No code snippet is provided due to the lack of specific implementation details in the issue.

Notes

The solution may require significant changes to the existing MultimodalSynthesizer class and its methods. It is essential to thoroughly review and test the updated implementation to ensure compatibility and correctness.

Recommendation

Apply workaround: Update the get_response API and synthesize function to support multimodal processing, as this seems to be the core issue preventing multimodal synthesis.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING