langchain - 💡(How to fix) Fix Add One-to-Many Batch Similarity Search Against Custom Entity Lists [1 comments, 2 participants]

felipheggaliza · 2026-03-18T21:14:27Z

[langchain] Checked other resources - x This is a feature request, not a bug report or usage question. - x I added a clear and descriptive title that summarize… ### Checked other resources - [x] This is a feature request, not a bug report or usage question. - [x] I added a clear and descriptive title that summarizes the feature request. - [x] I used the GitHub search to find a similar feature request and didn't find it. - [x] I checked the LangChain documentation and API reference to see if this feature already exists. - [x] This is not related to the langchain-community package. ### Package (Required) - [x] langchain - [ ] langchain-openai - [ ] langchain-anthropic - [ ] langchain-classic - [ ] langchain-core - [ ] langchain-model-profiles - [ ] langchain-tests - [ ] langchain-text-splitters - [ ] langchain-chroma - [ ] langchain-deepseek - [ ] langchain-exa - [ ] langchain-fireworks - [ ] langchain-groq - [ ] langchain-huggingface - [ ] langchain-mistralai - [ ] langchain-nomic - [ ] langchain-ollama - [ ] langchain-openrouter - [ ] langchain-perplexity - [ ] langchain-qdrant - [ ] langchain-xai - [ ] Other / not sure / general ### Feature Description Currently, LangChain supports similarity search for a single query embedding against all vectors in a vector store (top-k search), and some vector stores support multi-query (many queries at once). However, there is no native, efficient API for the following use case: Given one query embedding and a custom list of entity embeddings (not the entire database), efficiently compute the similarity between the query and each entity in the list, ideally in a single call. ### Use Case - Usage of different embedding strategies in multi-step ranking. One embedding for retrieval and another for re-ranking - Efficient enrichment of previously retrieved documents with similarity scores for re-ranking or hybrid retrieval workflows Two-stage retrieval is a common pattern in Search, Recommender Systems and RAG applications. ![Image](https://github.com/user-attachments/assets/5c0d0073-929a-449f-984c-77f8a597da45) ### Proposed Solution - Add a method to the VectorStore interface, e.g., batch_similarity(query_embedding, entity_ids) or similarity_to_entities(query_embedding, entity_embeddings) - The method should efficiently compute similarity scores between the query and the provided list of entities, leveraging backend capabilities where possible (e.g., SQL IN clause, vector store filtering) - Return a list of (entity_id, similarity_score) pairs, sorted by similarity ### Alternatives Considered _No response_ ### Additional Context _No response_

langchain2026-03-18 21:14:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

langchain-ai/langchain#36090•Fetched 2026-04-08 00:58:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

felipheggaliza

Participants

felipheggaliza

sasmita016

Timeline (top)

labeled ×3commented ×1issue_type_added ×1

extent analysis

Fix Plan

To implement the proposed solution, we can add a new method to the VectorStore interface. Here are the steps:

Add a new method similarity_to_entities to the VectorStore interface:
- This method will take in a query_embedding and a list of entity_embeddings
- It will return a list of tuples containing the entity_id and the corresponding similarity_score
Implement the similarity_to_entities method in each vector store backend:
- Leverage backend capabilities to efficiently compute similarity scores, such as using SQL IN clause or vector store filtering
- Return the list of tuples sorted by similarity score

Example code:

from typing import List, Tuple
import numpy as np

class VectorStore:
    def similarity_to_entities(self, query_embedding: np.ndarray, entity_embeddings: List[np.ndarray]) -> List[Tuple[int, float]]:
        # Compute similarity scores between query and entity embeddings
        similarity_scores = []
        for i, entity_embedding in enumerate(entity_embeddings):
            similarity_score = np.dot(query_embedding, entity_embedding) / (np.linalg.norm(query_embedding) * np.linalg.norm(entity_embedding))
            similarity_scores.append((i, similarity_score))
        # Sort similarity scores in descending order
        similarity_scores.sort(key=lambda x: x[1], reverse=True)
        return similarity_scores

Verification

To verify the fix, you can write test cases to check the correctness of the similarity_to_entities method. For example:

import unittest
import numpy as np

class TestVectorStore(unittest.TestCase):
    def test_similarity_to_entities(self):
        # Create a sample vector store
        vector_store = VectorStore()
        # Create a sample query embedding and entity embeddings
        query_embedding = np.array([1, 2, 3])
        entity_embeddings = [np.array([4, 5, 6]), np.array([7, 8, 9])]
        # Compute similarity scores
        similarity_scores = vector_store.similarity_to_entities(query_embedding, entity_embeddings)
        # Check that the similarity scores are correct
        self.assertEqual(len(similarity_scores), 2)
        self.assertGreater(similarity_scores[0][1], similarity_scores[1][1])

Extra Tips

Make sure to handle edge cases, such as empty lists of entity embeddings or query embeddings with zero norm.
Consider adding support for different similarity metrics, such as cosine similarity or Euclidean distance.
Optimize the implementation for performance, especially for large lists of entity embeddings.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #model download #tokenizer error #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - 💡(How to fix) Fix Add One-to-Many Batch Similarity Search Against Custom Entity Lists [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Checked other resources

Package (Required)

Feature Description

Use Case

Proposed Solution

Alternatives Considered

Additional Context

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

langchain - 💡(How to fix) Fix Add One-to-Many Batch Similarity Search Against Custom Entity Lists [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Checked other resources

Package (Required)

Feature Description

Use Case

Proposed Solution

Alternatives Considered

Additional Context

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING