How It Works
Run both dense (vector) and sparse (BM25) retrieval in parallel. Merge the results using a fusion algorithm. The most common approach is Reciprocal Rank Fusion (RRF): each result gets a score based on its rank in each list, and the final ranking is the sum of these scores.
RRF Formula
score(d) = Σ 1 / (k + rank_i(d))
Where k is a constant (typically 60) and rank_i(d) is the rank of document d in the i-th retrieval list. Documents that appear high in both lists get the highest combined score.
Native Hybrid Search
Weaviate: Built-in hybrid search with configurable alpha (0 = pure BM25, 1 = pure vector).
Qdrant: Sparse-dense vectors in the same collection.
Pinecone: Sparse-dense vectors via dotproduct.
Elasticsearch: kNN + BM25 in a single query.
# LangChain — Ensemble Retriever (hybrid)
from langchain.retrievers import EnsembleRetriever
hybrid = EnsembleRetriever(
retrievers=[dense_retriever, bm25_retriever],
weights=[0.5, 0.5]
)
docs = hybrid.invoke("SKU-4829 refund policy")
# Weaviate — native hybrid
results = collection.query.hybrid(
query="SKU-4829 refund policy",
alpha=0.5, # 0=BM25, 1=vector
limit=10
)
Hybrid search is the single biggest retrieval improvement for most RAG systems. Research consistently shows hybrid outperforms either dense or sparse alone. Start with equal weights (0.5/0.5) and tune from there. If your queries are mostly keyword-heavy, shift toward BM25; if mostly conceptual, shift toward dense.