This is a cache of https://developer.ibm.com/tutorials/awb-enhancing-retrieval-hnsw-rag/. It is a snapshot of the page as it appeared on 2025-11-24T07:54:47.971+0000.
Enhancing AI retrieval with HNSW in RAG applications - IBM Developer

Tutorial

Enhancing AI retrieval with HNSW in RAG applications

Explore how HNSW improves retrieval speed and scalability in RAG pipelines with a step-by-step implementation guide

By

Niranjan Khedkar

Retrieval-Augmented Generation (RAG) improves how AI models find and generate relevant information, making responses more accurate and useful. However, as data grows, fast and efficient retrieval becomes essential. Traditional search methods such as brute-force similarity search, are slow and do not scale well.

Hierarchical Navigable Small World (HNSW) is a graph-based Approximate Nearest Neighbor (ANN) search algorithm that offers high speed and scalability, making it a great fit for RAG systems. This tutorial explores how HNSW enhances retrieval in AI applications, particularly within IBM’s AI solutions. This tutorial also provides a step-by-step implementation guide and discusses optimizations for large-scale use.

Why HNSW is ideal for RAG

Efficient retrieval is key to AI-driven applications. Large-scale knowledge systems need high accuracy, low latency, and scalability. HNSW meets these needs by offering:

  • Speed and scalability: Finds results quickly, even with millions of documents.
  • High recall and accuracy: Outperforms other ANN methods such as LSH and IVFPQ.
  • Efficient memory use: Balances performance with resource efficiency.
  • Real-time updates: Supports adding and removing data dynamically.

HNSW is ideal for AI chatbots, enterprise search, recommendation engines, and domain-specific assistants. By using HNSW, developers can improve both speed and precision in RAG applications.

How to use HNSW in a RAG pipeline

A RAG pipeline typically has four main stages:

  1. Document processing and embedding: Convert text documents into vector embeddings using models such as IBM watsonx.ai or Hugging Face Transformers.
  2. Indexing with HNSW – Store embeddings in an HNSW index for fast nearest-neighbor search.
  3. Retrieval and augmentation – Use HNSW to find the most relevant documents for a given query.
  4. Response generation – Feed retrieved data into an LLM (for example, IBM Granite) to generate a response.

Replacing traditional search methods with HNSW significantly improves retrieval speed and accuracy in RAG applications.

Implementing HNSW for RAG in Python

Step 1. Install required libraries

Before using HNSW in a RAG pipeline, install the necessary packages:

pip install hnswlib transformers sentence-transformers

Step 2. Generate text embeddings

Use a pre-trained transformer model to convert text into vector embeddings:

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

texts = ["IBM AI is revolutionizing enterprise search.", "HNSW accelerates nearest neighbor search."]

# Convert texts to vector embeddings
embeddings = model.encode(texts, normalize_embeddings=True)

Step 3. Build an HNSW index

Create an HNSW index to store and search vector embeddings efficiently:

import hnswlib

dim = embeddings.shape[1]
num_elements = len(embeddings)

# Initialize the HNSW index
p = hnswlib.Index(space="cosine", dim=dim)
p.init_index(max_elements=num_elements, ef_construction=200, M=16)
p.add_items(embeddings, np.arange(num_elements))

# Save the index for future use
p.save_index("rag_hnsw_index.bin")

Step 4. Perform fast retrieval

Load the HNSW index and retrieve the most relevant documents for a query:

# Load the index
p.load_index("rag_hnsw_index.bin")

# Querying the system
query_text = "How does IBM use AI?"
query_embedding = model.encode([query_text], normalize_embeddings=True)

# Retrieve nearest neighbors
labels, distances = p.knn_query(query_embedding, k=2)

# Print retrieved results
print(f"Retrieved documents: {[texts[i] for i in labels[0]]}")

This approach makes retrieval much faster than brute-force search while maintaining high accuracy.

Optimizing HNSW for large-scale RAG

To improve retrieval efficiency in large-scale systems, consider these optimizations:

  • Tune M and ef_construction: Adjust these parameters to balance speed, memory use, and recall.
  • Adjust ef_search dynamically: Higher values improve accuracy but may slow searches.
  • Use hybrid search (HNSW + BM25): Combine semantic search with keyword-based retrieval for better precision.
  • Optimize memory usage: Apply quantization techniques (e.g., PQ, OPQ) to reduce memory while maintaining quality.
  • Implement index sharding and distributed search: Splitting the index across multiple nodes improves scalability for billions of vectors.

Conclusion

Integrating HNSW into RAG pipelines improves retrieval speed, scalability, and accuracy. With its fast search time, dynamic updates, and high recall, HNSW is a valuable tool for large-scale AI applications, including IBM’s AI-powered solutions.

For developers aiming to scale AI systems, HNSW enhances performance in chatbots, enterprise search, and AI assistants. Fine-tuning its parameters and combining it with other retrieval methods can further optimize results.

Ready to get started? Explore IBM Developer resources and build faster, smarter retrieval systems today!