openclaw - 💡(How to fix) Fix Hybrid search BM25 component penalizes multimodal (image/audio) results [1 participants]

openclaw2026-03-13 02:12:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#44540•Fetched 2026-04-08 00:45:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

markclawbot

Participants

markclawbot

When memorySearch.multimodal.enabled = true with Gemini embedding 2, image and audio files are properly indexed with valid embeddings. However, they never surface in memory_search results under the default hybrid search configuration.

Root Cause

Hybrid search computes: finalScore = vectorWeight × vectorScore + textWeight × textScore

Image/audio chunks have minimal text content (e.g., "Image file: generated/images/photo.png"), so their BM25 (text) score is near-zero for any natural language query. With default weights (0.7/0.3), the BM25 penalty is enough to push image results below text-only chunks that match both signals.

Fix Action

Workaround

Setting vectorWeight: 0.9, textWeight: 0.1 allows image results to surface (tested — images jump to #1 and #2 in results).

Code Example

if (chunk.modality === 'image' || chunk.modality === 'audio') {
  finalScore = vectorScore;  // BM25 is meaningless for binary content
} else {
  finalScore = vectorWeight * vectorScore + textWeight * textScore;
}

RAW_BUFFERClick to expand / collapse

Summary

Root Cause

Hybrid search computes: finalScore = vectorWeight × vectorScore + textWeight × textScore

Reproduction

Configure multimodal memory search with Gemini embedding 2
Index image files via extraPaths
Run memory_search with a query describing image content (e.g., "lobster and dolphin underwater cartoon")
Observe: only markdown text results returned, zero images

Workaround

Setting vectorWeight: 0.9, textWeight: 0.1 allows image results to surface (tested — images jump to #1 and #2 in results).

Suggested Fix

The hybrid merge function should detect when a chunk's source modality is non-text (image/audio) and skip the BM25 component for those chunks, using vector score only. Something like:

if (chunk.modality === 'image' || chunk.modality === 'audio') {
  finalScore = vectorScore;  // BM25 is meaningless for binary content
} else {
  finalScore = vectorWeight * vectorScore + textWeight * textScore;
}

This would let multimodal and text results compete fairly without requiring users to weaken BM25 for text-on-text queries.

Environment

OpenClaw 2026.3.11
Provider: gemini, model: gemini-embedding-2-preview
296 indexed files (169 .md, 100 images, 27 audio)
Default hybrid config (no custom weights) reproduces the issue

extent analysis

Fix Plan

To fix the issue, we need to modify the hybrid merge function to handle non-text chunks (image/audio) differently. Here are the steps:

Update the hybridMerge function to check the chunk's modality
If the modality is 'image' or 'audio', use only the vector score
Otherwise, use the default hybrid scoring formula

Example code:

function hybridMerge(chunk, vectorScore, textScore, vectorWeight, textWeight) {
  if (chunk.modality === 'image' || chunk.modality === 'audio') {
    return vectorScore;  // BM25 is meaningless for binary content
  } else {
    return vectorWeight * vectorScore + textWeight * textScore;
  }
}

Replace the existing hybridMerge function with the updated one
No changes are needed to the indexing or search query code

Verification

To verify the fix, follow these steps:

Index image and audio files using extraPaths
Run a memory_search query that describes image content (e.g., "lobster and dolphin underwater cartoon")
Check that image results are returned and ranked correctly
Test with different queries and modalities to ensure the fix is working as expected

Extra Tips

Make sure to update the hybridMerge function in the correct location, depending on your project's architecture
Consider adding logging or debugging statements to verify that the updated function is being called correctly
If you're using a version control system, create a new branch for the fix and test it thoroughly before merging it into the main branch.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #serialization error #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Hybrid search BM25 component penalizes multimodal (image/audio) results [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Root Cause

Reproduction

Workaround

Suggested Fix

Environment

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Hybrid search BM25 component penalizes multimodal (image/audio) results [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Root Cause

Reproduction

Workaround

Suggested Fix

Environment

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING