Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.
“When I first typed ‘drone’ into the search and saw results for ‘unmanned aerial vehicles’ without synonyms, I was like, ‘Wow, this thing really gets it.’ That’s when it clicked—it genuinely felt like magic.” — Logan Pashby, Principal Engineer, Cypris.ai
Relevance at scale: Cypris’ search story
Cypris is a platform that helps R&D and innovation teams navigate a massive dataset of patents and research papers of over 500 million documents. Their mission is to make it easier to track innovation, find prior art, and understand the organizations driving new technologies.
But there was a problem. To get relevant results, users had to write complex boolean queries—which was fine for expert users, but a barrier for many others. Cypris needed a way to make search more intuitive and accessible.
The answer was semantic search powered by vector similarity. However, they discovered that scaling semantic search over a large corpus turned out to be a tough engineering problem. Handling 500 million high dimensional vectors wasn’t just a matter of pushing them into a system and hitting “search.” “When we first indexed all 500 million vectors, we were looking at 30- to 60-second query times in the worst case.”
It would require a series of carefully considered trade-offs between model complexity, hardware resources, and indexing strategy.
Logan Pashby is a Principal Engineer at Cypris, where he focuses on the platform's innovation intelligence features. With expertise in topics such as deep learning, distributed systems, and full-stack development, Logan solves complex data challenges and develops efficient search solutions for R&D and IP teams.
Choosing the right model
Cypris’ first attempt at vector search used 750-dimensional embeddings for every document, but they quickly realized scaling such large embeddings across 500 million documents would be unmanageable. By using the memory approximation formula without quantization, the estimated bytes of RAM required would be around 1500 GB, making it clear that they needed to adjust their strategy.
“We assumed, and we hoped, that the larger the dimension of the vector, the more information we could encode. A richer embedding space should mean better search relevance.”
They considered using sparse vectors like Elastic’s ELSER which avoids the fixed-dimension limitations of dense embeddings by representing documents as weighted lists of tokens instead. However, at the time, ELSER’s CPU-only inference seemed too slow for Cypris’s dataset. Dense vectors, on the other hand, let them leverage off-cluster GPU acceleration, which improved throughput by 10x to 50x when generating embeddings.

Cypris’ setup included an external GPU based service to compute vectors which were then indexed into Elasticsearch.
The team ultimately decided on lower-dimensional dense vectors that struck a balance: they were compact enough to make indexing and search feasible, yet rich enough to maintain relevance in results.
Making it work with production scale data
Challenges - disk space
Once Cypris had vectors ready to be indexed, they faced the next hurdle: efficiently storing and searching over them in Elasticsearch.
The first step was reducing disk space. “At the end of the day, vectors are just arrays of floats.... But when you have 500 million of them, the storage requirements add up quickly.” By default, vectors in Elasticsearch are stored multiple times: first in the _source field (the original JSON document), then in doc_values (columnar storage optimized for retrieval), and finally within the HNSW graph itself. Given that each 750-dimensional float32 vector takes about 3KB, storing 500 million vectors quickly becomes problematic, potentially exceeding 1.5 terabytes per storage layer.
One practical optimization Cypris used was excluding vectors from the source document in Elasticsearch. This helped reduce overhead, but it turned out disk space wasn’t the biggest challenge. The bigger challenge was memory management.
Did You Know?
Elasticsearch allows you to optimize disk space by excluding vectors from the source document. This can significantly reduce storage costs, especially when dealing with large datasets. However, be aware that excluding vectors from the source will impact reindexing performance. For more details, check out the Elasticsearch documentation on source filtering.
Challenges - RAM explosion
Known nearest neighbor (kNN) search in Elasticsearch relies on HNSW graphs, which perform best when fully loaded into RAM. With 500 million high-dimensional vectors, there were significant memory demands on the system. “Trying to fit all of those vectors in memory at query time was not an easy thing to do,” Logan adds.
Cypris had to juggle multiple memory requirements: the vectors and their HNSW graphs needed to reside in off-heap memory for fast search performance, while the JVM heap had to remain available for other operations. On top of that, they still needed to support traditional keyword search, and the associated Elasticsearch inverted index would need to stay in memory as well.
Managing memory with dimensionality reduction, quantization, and segments
Cypris explored multiple approaches to better manage memory and storage, here were three that worked well:
- Lower-dimensional vectors: The Cypris team swapped to using a smaller model that reduced vector sizes, thereby lowering resource requirements.
- BBQ (Better Binary Quantization): Cypris was considering int8 quantization, but when Elastic released BBQ, Cypris adopted it quickly. “We tested it out and it didn’t have a huge hit to relevance and was significantly cheaper. So we implemented it right away”, says Logan. BBQ immediately reduced the size of their vector indexes by around 20%!
Did You Know?
Elasticsearch’s Binary Quantized Vectors (BBQ) can reduce the size of vector indexes by ~20%, with minimal impact on search relevance. BBQ reduces both disk usage—by shrinking index size—and memory usage, since smaller vectors take up less space in RAM during searches. It’s especially helpful when scaling KNN search with HNSW graphs, where keeping everything in memory is critical for performance. Explore how BBQ can optimize your search infrastructure in the Elasticsearch documentation on vector search.
- Segment and shard tuning: Cypris also optimized how Elasticsearch segments and shards were managed. HNSW graphs are built per segment, so searching dense vectors means querying across all segments in a shard. As Logan explains: “HNSW graphs are independent within each segment and each dense vector field search involves finding the nearest neighbors in every segment, making the total cost dependent on the number of segments.”
Fewer segments generally mean faster searches—but aggressively merging them can slow down indexing. Since Cypris ingests new documents daily, they regularly force-merge segments to keep them slightly below the default 5GB threshold, preserving automatic merging and tombstone garbage collection. To balance search speed with indexing throughput, force-merging occurs during low-traffic periods, and shard sizes are maintained within a healthy range (below 50GB) to optimize performance without sacrificing ingestion speed.
More vectors, faster searches, happy users
With these optimizations, Cypris brought query times down from 30–60 seconds to 5–10 seconds. They are also seeing 60–70% of their user queries shift from the previous boolean search experience to the new semantic search interface.
But the team is not stopping here! The goal is to achieve sub-second queries to support fast, iterative search and get most of their users to shift to semantic search.

Cypris’ product handles 500M docs (or about 7TB+ data), providing real-time AI search and retrieval, and supports 30% quarterly company growth. The product significantly accelerated search use cases, cutting report generation from weeks down to minutes.
What did the Cypris team learn? … and what’s next?
500 million vectors don’t scale themselves
Handling 500 million vectors isn’t just a storage problem or a search problem—it’s both. Cypris had to balance search relevance, hardware resources, and indexing performance at every step.
Did you know
Elasticsearch's _search API includes a profile feature that allows you to analyze the execution time of search queries. This can help identify bottlenecks and optimize query performance. By enabling profiling, you can gain insights into how different components of your query are processed. Learn more about using the profile feature in the Elasticsearch search profiling documentation.
With search, there’s always a trade-off
BBQ was a major win, but it didn’t eliminate the need to rethink sharding, memory allocation, and indexing strategy. Reducing the number of segments improved search speed, but made indexing slower. Excluding vectors from the source reduced disk space but complicated reindexing, as Elasticsearch doesn’t retain the original vector data needed to efficiently recreate the index. Every optimization came with a cost that had to be carefully weighed.
Prioritize your users, not the model
Cypris didn’t chase the largest models or highest dimension vectors. They focused on what made sense for their users, and working backwards. “Figure out what relevance means for your data,” Logan advises. “And work backward from there.”
Cypris is now expanding to other datasets, which could double the number of documents they have to index in elastic. They need to move quickly to stay competitive, “We’re a small team,” Logan says. “So everything we do has to scale—and it has to work.”
To learn more, visit cypris.ai