langchain - 💡(How to fix) Fix Add One-to-Many Batch Similarity Search Against Custom Entity Lists [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36090Fetched 2026-04-08 00:58:09
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Timeline (top)
labeled ×3commented ×1issue_type_added ×1
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Feature Description

Currently, LangChain supports similarity search for a single query embedding against all vectors in a vector store (top-k search), and some vector stores support multi-query (many queries at once). However, there is no native, efficient API for the following use case:

Given one query embedding and a custom list of entity embeddings (not the entire database), efficiently compute the similarity between the query and each entity in the list, ideally in a single call.

Use Case

  • Usage of different embedding strategies in multi-step ranking. One embedding for retrieval and another for re-ranking
  • Efficient enrichment of previously retrieved documents with similarity scores for re-ranking or hybrid retrieval workflows

Two-stage retrieval is a common pattern in Search, Recommender Systems and RAG applications.

Image

Proposed Solution

  • Add a method to the VectorStore interface, e.g., batch_similarity(query_embedding, entity_ids) or similarity_to_entities(query_embedding, entity_embeddings)
  • The method should efficiently compute similarity scores between the query and the provided list of entities, leveraging backend capabilities where possible (e.g., SQL IN clause, vector store filtering)
  • Return a list of (entity_id, similarity_score) pairs, sorted by similarity

Alternatives Considered

No response

Additional Context

No response

extent analysis

Fix Plan

To implement the proposed solution, we can add a new method to the VectorStore interface. Here are the steps:

  • Add a new method similarity_to_entities to the VectorStore interface:
    • This method will take in a query_embedding and a list of entity_embeddings
    • It will return a list of tuples containing the entity_id and the corresponding similarity_score
  • Implement the similarity_to_entities method in each vector store backend:
    • Leverage backend capabilities to efficiently compute similarity scores, such as using SQL IN clause or vector store filtering
    • Return the list of tuples sorted by similarity score

Example code:

from typing import List, Tuple
import numpy as np

class VectorStore:
    def similarity_to_entities(self, query_embedding: np.ndarray, entity_embeddings: List[np.ndarray]) -> List[Tuple[int, float]]:
        # Compute similarity scores between query and entity embeddings
        similarity_scores = []
        for i, entity_embedding in enumerate(entity_embeddings):
            similarity_score = np.dot(query_embedding, entity_embedding) / (np.linalg.norm(query_embedding) * np.linalg.norm(entity_embedding))
            similarity_scores.append((i, similarity_score))
        # Sort similarity scores in descending order
        similarity_scores.sort(key=lambda x: x[1], reverse=True)
        return similarity_scores

Verification

To verify the fix, you can write test cases to check the correctness of the similarity_to_entities method. For example:

import unittest
import numpy as np

class TestVectorStore(unittest.TestCase):
    def test_similarity_to_entities(self):
        # Create a sample vector store
        vector_store = VectorStore()
        # Create a sample query embedding and entity embeddings
        query_embedding = np.array([1, 2, 3])
        entity_embeddings = [np.array([4, 5, 6]), np.array([7, 8, 9])]
        # Compute similarity scores
        similarity_scores = vector_store.similarity_to_entities(query_embedding, entity_embeddings)
        # Check that the similarity scores are correct
        self.assertEqual(len(similarity_scores), 2)
        self.assertGreater(similarity_scores[0][1], similarity_scores[1][1])

Extra Tips

  • Make sure to handle edge cases, such as empty lists of entity embeddings or query embeddings with zero norm.
  • Consider adding support for different similarity metrics, such as cosine similarity or Euclidean distance.
  • Optimize the implementation for performance, especially for large lists of entity embeddings.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING