This is a cache of https://www.elastic.co/search-labs/blog/elasticsearch-pagination-with-collapse-and-cardinality. It is a snapshot of the page at 2025-07-05T00:51:06.728+0000.
Efficient pagination with collapse and cardinality in Elasticsearch - Elasticsearch Labs

Efficient pagination with collapse and cardinality in Elasticsearch

Deduplicating product variants in Elasticsearch? Here’s how to determine the correct pagination.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

In any large catalog, from e-commerce products to article listings, duplicates are inevitable. While Elasticsearch's collapse feature is excellent for grouping these variants and presenting a clean UI, it introduces a critical challenge for pagination. The total hit count reflects the original number of documents, not the final number of unique groups, making it impossible to build a reliable pager. This post details the definitive pattern to solve this by pairing collapse with a cardinality aggregation to get the true result count.

Pagination with collapse and cardinality

In Elasticsearch, a frequent challenge in creating efficient search experiences, particularly in e-commerce, involves deduplication combined with pagination, the process of dividing a large set of data into smaller, manageable pages. For instance, a product catalog might include various versions (sizes, colors, SKUs) of the same product design.

Using the collapse feature, we can group hits by a single field to show a single representative result per group. This avoids flooding the UI with near-duplicates. Let’s walk through a practical example.

Example: Combining SKUs with collapse

Here are some example JSON documents that you can import through Kibana DevTools. These examples are designed to illustrate the scenario described in the article, where you have product variations (sizes and colors) that you want to group.

Explanation of the JSON Structure

Each document represents a specific product variant (SKU).

  • product_id: A unique identifier for the core product. This is the field you will use for the collapse and cardinality aggregation.
  • sku: The unique stock keeping unit for the specific variant.
  • name: The general name of the product.
  • color: The color of the product variant.
  • size: The size of the product variant.
  • price: The price of the specific variant.
  • timestamp: A timestamp to allow for sorting and selecting the most recent version if needed.

To import the following data, you can copy/paste the lines in Kibana DevTools and press the play button that appears on the screen on the right side of the gray outlined block.

Now we can run the search and collapse on the product_id.keyword field to get the results back:

And this is how the results look like:

However, there’s an important caveat:

The hits.total.value returned still reflects the total number of documents, not the number of unique groups after collapsing. This means relying on hits.total.value for pagination breaks the UX and misrepresents the total number of result pages. If the UI is showing 5 results, it would have page 2. But loading page 2 would fail because there are only 3 documents instead of 9.

Solution: Combine collapse with cardinality

By adding a cardinality aggregation on the same collapse field, we can accurately compute the number of distinct groups, enabling reliable and predictable pagination.

Here’s an example query:

Conclusion

By now, the challenge of pagination with collapsed search results should be clear, as should the definitive solution. Relying on the default hits.total.value when using the collapse feature inevitably leads to a broken user experience, displaying incorrect page counts and frustrating users.

The key takeaway is the robust pattern of pairing the collapse query with a cardinality aggregation on the very same field.

  • collapse: Deduplicates results at query-time by grouping on collapseField.
  • cardinality: Returns an approximate count of unique collapseField values, essential for paginating over grouped results.
  • sort + from: Still respected, but applied after collapsing.

hits.total.value: Will reflect total documents—not deduplicated ones. Don't use this for pagination in collapsed queries.

Related content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself