This is a cache of https://developer.ibm.com/tutorials/awb-enhancing-retrieval-hnsw-rag/. It is a snapshot of the page as it appeared on 2025-11-26T04:46:45.499+0000.
Enhancing AI retrieval with HNSW in RAG applications - IBM Developer
Retrieval-Augmented Generation (RAG) improves how AI models find and generate relevant information, making responses more accurate and useful. However, as data grows, fast and efficient retrieval becomes essential. Traditional search methods such as brute-force similarity search, are slow and do not scale well.
Hierarchical Navigable Small World (HNSW) is a graph-based Approximate Nearest Neighbor (ANN) search algorithm that offers high speed and scalability, making it a great fit for RAG systems. This tutorial explores how HNSW enhances retrieval in AI applications, particularly within IBM’s AI solutions. This tutorial also provides a step-by-step implementation guide and discusses optimizations for large-scale use.
Why HNSW is ideal for RAG
Efficient retrieval is key to AI-driven applications. Large-scale knowledge systems need high accuracy, low latency, and scalability. HNSW meets these needs by offering:
Speed and scalability: Finds results quickly, even with millions of documents.
High recall and accuracy: Outperforms other ANN methods such as LSH and IVFPQ.
Efficient memory use: Balances performance with resource efficiency.
Real-time updates: Supports adding and removing data dynamically.
HNSW is ideal for AI chatbots, enterprise search, recommendation engines, and domain-specific assistants. By using HNSW, developers can improve both speed and precision in RAG applications.
Use a pre-trained transformer model to convert text into vector embeddings:
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
texts = ["IBM AI is revolutionizing enterprise search.", "HNSW accelerates nearest neighbor search."]
# Convert texts to vector embeddings
embeddings = model.encode(texts, normalize_embeddings=True)
Copy codeCopied!
Step 3. Build an HNSW index
Create an HNSW index to store and search vector embeddings efficiently:
import hnswlib
dim = embeddings.shape[1]
num_elements = len(embeddings)
# Initialize the HNSW index
p = hnswlib.Index(space="cosine", dim=dim)
p.init_index(max_elements=num_elements, ef_construction=200, M=16)
p.add_items(embeddings, np.arange(num_elements))
# Save the index for future use
p.save_index("rag_hnsw_index.bin")
Copy codeCopied!
Step 4. Perform fast retrieval
Load the HNSW index and retrieve the most relevant documents for a query:
# Load the index
p.load_index("rag_hnsw_index.bin")
# Querying the system
query_text = "How does IBM use AI?"
query_embedding = model.encode([query_text], normalize_embeddings=True)
# Retrieve nearest neighbors
labels, distances = p.knn_query(query_embedding, k=2)
# Print retrieved resultsprint(f"Retrieved documents: {[texts[i] for i in labels[0]]}")
Copy codeCopied!
This approach makes retrieval much faster than brute-force search while maintaining high accuracy.
Optimizing HNSW for large-scale RAG
To improve retrieval efficiency in large-scale systems, consider these optimizations:
Tune M and ef_construction: Adjust these parameters to balance speed, memory use, and recall.
Adjust ef_search dynamically: Higher values improve accuracy but may slow searches.
Use hybrid search (HNSW + BM25): Combine semantic search with keyword-based retrieval for better precision.
Optimize memory usage: Apply quantization techniques (e.g., PQ, OPQ) to reduce memory while maintaining quality.
Implement index sharding and distributed search: Splitting the index across multiple nodes improves scalability for billions of vectors.
Conclusion
Integrating HNSW into RAG pipelines improves retrieval speed, scalability, and accuracy. With its fast search time, dynamic updates, and high recall, HNSW is a valuable tool for large-scale AI applications, including IBM’s AI-powered solutions.
For developers aiming to scale AI systems, HNSW enhances performance in chatbots, enterprise search, and AI assistants. Fine-tuning its parameters and combining it with other retrieval methods can further optimize results.
Ready to get started? Explore IBM Developer resources and build faster, smarter retrieval systems today!
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.