Almost all Elastic stack users at some point will use a Dashboard. Elastic ships many out-of-the-box dashboards for all its integrations, and users will create custom dashboards: to share with others, do a root cause analysis, and/or generate a PNg report. As a result, the Dashboard app is one of the most used applications in Kibana (for those wondering, the other most popular one is Discover).
In addition, many of the core application components of Dashboards support other Kibana applications. Kibana’s widget framework for embedding charts and tables was originally developed for Dashboards, or the data plugin, which brokers search requests to Elasticsearch from the Kibana browser app and is heavily used by all Kibana plugins.
In other words, the Dashboard application – both from a product and technical perspective – is central to Kibana, and deserves to be the best experience for users.
Yet, dashboards in Kibana could feel “sluggish”. We would experience it as developers, and we would hear it from users.
Comparisons with other tools (like grafana, or to a lesser extent OpenSearch Dashboards) also showed that Dashboards in Kibana sometimes tend to feel slower.
For this reason, the Kibana Team recently undertook an effort to bring down the render time of a Dashboard.
Identifying the challenge
Looking at real user telemetry (we’ll discuss this more below) we see a clear 80/20 division in the render time of Dashboards.
a) On one hand, some dashboards take tens of seconds to load
The time here is dominated by (very) long-running Elasticsearch queries. The 95th percentile (i.e. the minority) of dashboard rendering times sits significantly above the mean. This does not have to be unexpected; for example, the search the panels issue can span a large time-range, queries hit cold or frozen storage tiers that are not optimized for analytical workloads. These are a minority of dashboards, “the long tail”.

Note also a clear seasonality in the data, with longer render times during the weekday versus the weekend. This likely indicates that the “worst case” is influenced by the overall load on the cluster including ingest (not just the runtime analytical queries), which tends to be elevated during working hours.
(b) On the other hand, the majority of dashboards load in the low-second range.
The 75% and under (the green, blue, and red lines) are significantly less, but they still take 1-3 seconds.
When searches to Elasticsearch complete quickly, then where is the time going in Kibana? Why should a Kibana dashboard ever feel sluggish, especially when a deployment is well provisioned and searches complete quickly?
In this initial phase of the project—and what we’ll summarize in this blog post—we decided in spring 2024 to focus on improving (b): the render time of the 75 percentile of dashboards, and ensure these become more snappy and pleasurable to work with.
We did not forget about (a)! At the end of the post, we will highlight the initiatives that will bring down the render time for this top 20th percentile too.
Telemetry
From the start, we realized we had poor measurements of the render time of Dashboards. Existing instrumentation did not capture the various stages of the page load of a dashboard. It was also important to align metrics with what we could make actionable in day to day development, and what we could collect from users in the real world.
a) What to measure
From a high level, we introduced three core metrics, each capturing a specific span of the dashboard load. These can be stacked neatly bottom-to-top when opening a dashboard by its unique URL in a fresh tab.
Metric | What | When | Frequency |
---|---|---|---|
kibana_loaded | Initial asset loading session (“the spinner”) | Opening Kibana for the first time | Once per user session |
dashboard_overhead | Bootstrapping of dashboard app and panels | Opening a specific dashboard | Once per dashboard |
time_to_data | Time for data to appear on screen | Changing a filter, a time-range, a control… | Once per query-state change |
This instrumentation is largely implemented ad-hoc. Conventional Core Web Vitals (https://web.dev/articles/vitals) do not exactly measure what we were looking for, although we can draw some parallels.
Specifically, Time-To-Interactive (TTI) corresponds roughly to the end of “kibana_loaded”. After that, the app is usable, although not all UX may have fully initialized yet. For example, the dashboard controls may not have been initialized yet, even though the dashboard-app technically can start responding to inputs.
Another tricky metric was to match the “completion” of a dashboard to when all data is on screen, comparable in spirit to Largest Contentful Paint (LCP). This is the result of AJAX data-requests and front-end rendering which are custom to each panel-type.
So to gather it correctly, each panel on a dashboard needs to report its “completion” state.
For some charts, this is fairly straightforward (e.g. a simple metric-chart), but for others this is complicated. For example, a map is only complete when all tiles are rendered on screen. Detecting this is not trivial.
After all panel-types report “completion” correctly, it is then up to the dashboard-application itself to observe all these “completion” events and report the “time-to-data” metric accordingly after the last completion.
Additionally, further segmentations of these benchmarks were introduced, as well as additional metadata like the number of panels on a dashboard or whether the user navigated from within Kibana (some assets are loaded already) or from outside Kibana into the dashboard (no assets have been loaded). The app is also collecting metrics on the server on the duration of data request.
b) Importance of each segment
Each of these segments has different semantics. Walking top-down the metric list:
When it comes to “snappyness”, it is largely dominated by time-2-data. Each time a user manipulates the filter-state (e.g. time range), the user will need to wait for the new data to occur on screen. It is here where “lag” matters most. Think of a video game. Players may tolerate the loading icon before the level starts, but they then expect responsive gameplay when starting to control the game. This is the same for a dashboard. Users interact with charts, controls, filters… these interactions are the “gameplay” of a dashboard, and they determine how users experience responsiveness.

Dashboard_overhead is relevant as well. It is the time it takes to load a dashboard configuration (which is a document retrieval from a system index in Elasticsearch). It also includes some additional code loading. This is because in the Kibana plugin-system, some code is loaded ad-hoc. To give an example: suppose a Dashboard has a swimlane-panel. The dashboard-app would initialize a “swimlane”-embeddable. If during that Kibana-session it is the first time the swimlane is being loaded, it will be up to the swimlane-embedable to ensure it has loaded all the swimlane-specific code before rendering. A deep-dive in the Kibana plugin system would take us too far, but summarized: some of the code loading footprint is captured in this “dashboard_overhead”, not just “kibana_loaded”.
kibana_loaded only occurs a single time for the entirety of a Kibana user session, and is not dashboard specific. Users can navigate to Dashboards from many other Kibana-pages. For this reason, we wanted to isolate kibana_loaded
from the specific experience of Dashboards. While being the largest “chunk” of time, it is also the least relevant when it comes to snappyness of the overall Kibana experience.
Benchmarking
Instrumentation in place, we now actually need to collect these in the appropriate environments: in benchmarking deployments, and real user deployments.
a) In CI
Performance metrics are being collected for a handful of representative dashboards. These have similar configurations as our integrations, each dashboard containing a mix of chart types.

An example of a dashboard benchmark
These benchmarks run every three hours on Kibana’s main release branch. The runner spins up a single Elasticsearch and Kibana cluster on dedicated hardware. Metrics are collected from Playwright scripts in a Chromium headless browser.
b) In the wild
The same metrics are reported for Elastic Cloud and Serverless users as well, and for self-hosted users who opt in into telemetry.
While the benchmarks in CI provide an actionable signal at a point in time, collecting these metrics in the wild provide a backward looking signal that helps validate whether the movement of our benchmarks reflects in the real world.
Later in the post, you will read how both evolved for the better part of last year.
A note on process (meetings! alerts!)
There is not a single engineer or team who “owns” the dashboard experience, as the display panels have been developed from all across Engineering.
To align effort, some logistic arrangements have proven useful.
Reporting
Twice weekly reporting provides an opportunity to review the evolution of telemetry and discuss any ongoing related work. It is easiest to consider these as mini-retrospectives.
Responding to regressions
Benchmarks have proven to be quite reliable, the runs being highly reproducible.
When there is negative or unexpected movement in the benchmark, it usually reliably indicates a regression that requires action.
There is no hard and fast rule on how to respond. Significant regressions are generally rolled back immediately. Other times, we may let the offending commit ride, until a bug fix is merged. This is decided on a case by case basis, and really depends on the nature of the change.

A typical smile-pattern in our benchmarks. A regression was detected and resolved.
Ad-hoc runs - validating hypothesis
As familiarity with the tooling has grown, we are also doing more trigger ad-hoc performance runs on individual PRs before they merge.
This allows for more rapid validation of code changes before they get merged into the main development trunk.
Improvements
With all this yak-shaving is out of the way, let’s finally get to the interesting part.
There has been no silver bullet. Kibana spends time everywhere, all of which adds a marginal overhead, which adds up in the aggregate. The improvements to Dashboard rendering has come from improved hygiene in many layers in the application. Below, we break these down in a couple of major themes.
Reducing code and asset loading
Efficient code loading is currently one of the largest challenges in Kibana. Kibana’s plugin architecture is very flexible in that it allows for fast turnaround in adding new pages and apps.
This flexibility does come with a cost, with two being critically related to JavaScript code loading in the browser. One is that code gets loaded that is never required, the other is that asset-loading tends to be fragmented, with multiple small JavaScript assets to support a single app rather than fewer larger files.
The first is particularly a problem for Dashboards. Many plugins register widgets with the dashboard app, for example,. a maps panel, a swimlane panel, etc…. However, most dashboards will never display a map or swimlane.
Example: plugins can add pop-up links to dashboard panels. These are context dependent.

Before: pseudo-code of how a plugin registers a pop-up item for dashboard panels. This pattern causes unnecessary code to be loaded.
To avoid this, a new pattern was introduced that allows clients to delay the code loading, until it is needed.
After: code only included when it is required. Isolate the definition in a different module
./case_cation.ts
./plugin.ts
The other issue is that initialization of plugins would block the responsiveness, causing a waterfall effect of asset loading getting serialized rather than parallelized.
One example of this is that dashboard controls used to block rendering of the page, causing all panels to have to wait until the controls had loaded all its assets. This is of course not necessary, as rendering should be able to start before controls are fully initialized.
Many of these code-loading issues were addressed, contributing to overall responsiveness. Since Kibana has many plugins (over 200 and counting), it will require sustained attention to address the misuse of both these inefficient code loading patterns.
Embeddables and rendering
A big effort from 2024 in Kibana was the embeddable refactor. This effort had a few goals, of which performance was largely an incidental concern. The effort had to enable a few critical features (like collapsible panels), remove a number of unused code paths (largely angular related), improve testability, and simplify some of the DevX when working with the APIs. You can read more about this here.
One way embeddables have allowed to tidy up dashboard performance is by consolidating all rendering in a single React tree. Before, each panel was rendered into its own render-tree with ReactDOM.render()
. This architecture was an artifact of a time when Kibana had both Angular and React-based (and jQuery, ahem) widgets. This mix of rendering technologies has not existed in Kibana for over 4 years, and has been fully standardized on React as the only UX rendering library. However, the Dashboard carried that legacy with an additional abstraction layer.
Reducing the number of state changes and re-renderings that panels respond to has been a marginal improvement to the Dashboard app, overall increasing its responsiveness. The reduction in code too has helped reduce the app’s footprint.
Avoid crufty memory allocation
The code for charts and tables will re-organize the data received from Elasticsearch in a data structure that is easier to manipulate for display. They perform “a flattening” step that takes a nested data structure from the ES-response, and turns it into a one-dimensional array, where each item of the array corresponds to a single feature (e.g. a row in a table, a bar in a barchart…).
E.g. consider a nested ES-doc with many sub-fields, or the hierarchical organization of buckets from an ES-arg search. The implementations for flattenings these often allocated short-lived objects, like object literals, or lambdas () => {}. Frequent use of array-comprehension methods like .map or .reduce are patterns where such object allocation sneaks in easily.
Since these flattening-operations all occur in tight recursive loops (thousands of documents, hundreds of buckets) and given that dashboards may contain multiple tables and multiple charts, these allocations can add up quickly. Liberal heap allocation like this also cuts into the user-experience twice: once at construction, but also as a strain on the garbage collector (garbage collection is less predictable, but tends to contribute to hiccups in the frame rate).
Our benchmarks showed meaningful improvements (between 5-10%) by removing some of the most glaring allocation in a few of these tight loops.
Data transfer improvements
The data request roundtrip from a Dashboard running Kibana-browser to and from Elasticsearch was batched. Multiple requests would be collected and the Kibana server would fan these out to Elasticsearch as individual _async_search
requests, and combine these ES-JSON responses in a new Kibana-specific format.
The main reason for this batching was that it side-steps the browser connection limit for HTTP1. This sits around 6 concurrent http requests, something which is easily exceeded on Dashboards with multiple panels.
This approach has two main disadvantages. It adds a small delay to collect the batches. It also puts a strain on the kibana-server to wait and re-encode the ES-responses. Kibana-server would need to unzip them first, decode the JSON, concatenate the responses, and then gzip again. While this re-encoding step is generally small, in the worst case (for example, for large responses), it could be significant. It would also add significant memory pressure, occasionaly causing Out Of Memory issues.

given that the proxies that sit in front of Elastic Cloud and Serverless already support HTTP 2.0, and that Kibana will start to support HTTP 2.0 for stateful in 9.0, it was decided to remove this batching. In addition, kibana-server no longer re-encodes the data and streams the original gzipped result from Elasticsearch.
This greatly simplifies the data-transfer architecture and in combination with running over HTTP 2.0, has shown some nice performance improvements. Apart from the performance benefits (faster, less sensitive to OOM), debuggability has much improved due to the simplified architecture and the fact that data-responses can now easily be inspected in the browser’s debugger.

Outcomes
The aggregate outcome of these changes have been significant, and is reflected both in the benchmarks and the user telemetry.
Benchmark evolution
The chart below shows the metrics for a mixture of multiple benchmarks, and how they have evolved the last 6 months. We can see an overall drop from around 3500ms to 2000ms (*the most recent uptick is related to a current effort to change the theme in Kibana. During this migration-phase we ship multiple themes. This will be removed over time. There’s also a few gaps when the CI-runs had keeled over).

Real world users
The real world, as explained in the intro, is harder to measure. We just do not know exactly which dashboards users are running, and how their configurations evolve over time.
However, looking at it from two different perspectives, we can verify the same evolution as in the synthetic benchmarking environment..
First, over time we see a drop of time to render in the 75 percentile. It allows us to say that – on average – the experience of users on a dashboard in Jan 2025 is significantly better than in June 24.
Dashboard render-time for 25, 50, and 75 percentile

We can also compare mean time_to_data by version of all users in the last week. Users in 8.17 wait noticeably less time for data to appear on screen than users of 8.12.

The drop in real world is also slightly delayed from what we are observing in our benchmarks, which roughly is in line with the cadence the stack is released.
Looking ahead
So the curve is trending downwards, largely by adding many small shaves.
There are some significant areas where this approach of trimming the fat will eventually lead to diminishing returns. Below are some areas we believe will benefit from more structural changes on how we approach the problem.
Code loading continued
We have not discussed that kibana_loaded
metric very much in this blog post. If we would have to characterize it: the Kibana plugin-architecture is optimized for allowing applications to load code ad-hoc, with the code-packaging process producing many JavaScript code bundles. However, in practice we do see unnecessary code being loaded, as well as “waterfall” code-loading where code loading may block rendering of the UX. All in all, things could improve here (see above, “Reducing code and asset loading”).
The team is currently engaging in a wider ranging effort, “Sustainable Kibana”, which includes revisiting how we package and deliver code to the browser.
We anticipate more benefits to materialize here later. When they do, be sure to check the blog post!
Dealing with slow searches
Long searches take a long time. It is what it is. An Elasticsearch search can be slow for many reasons, and this doesn’t even need to be a bug. Consider complex aggregations on a large cluster with terabytes of data, over a long time range hitting hundreds of nodes on different storage tiers. Depending on how the cluster is provisioned, this could end up being a slow search. Kibana needs to be resilient in face of this query-dependent limitation.
In such a scenario, we cannot improve the “snappiness” of a Dashboard with low-level improvements in Kibana (like those we have discussed above). To address inherent slowness, there needs to be a suite of new features that allow users to opt into a “fast” mode.
Consider for example sampling the data, where users can trade speed for accuracy. Or consider improving the perceived performance by incrementally filling in the charts as the data come in. Or allowing users to hide parts of the Dashboards with collapsible panels (they’re coming!).
These changes will straddle more the line between product feature and low-level tech improvement.
Chart and query dependent sluggishness
The current effort has mainly sought improvement into lower level components with broad impact. However, there are instances where dashboards are slow due to very specific chart configurations.
E.g.
- computing the “other” bucket,
- unique count aggregation over data with high cardinality,
- …
Identifying these charts and queries will allow for more targeted optimizations. Are the defaults correct (e.g. do all charts require another bucket)? Are there more efficient ways to query for the same data?
Adding Discover to the program
All the paintpoints of Dashboards are also the same painpoints in Discover (pluggability of charts, data-heavy, requirements to be responsive…). So we have rolled out this program to guide development in the Discover app.
Already, we are seeing some nice gains in Discover, and we’re looking to build on this momentum. This too would deserve its own blog post so stay tuned!
Conclusion
Dashboards are getting faster in Kibana. Recent improvements are due to the compound effect of many lower level optimizations.
To progress even further, we anticipate a more two-pronged approach: First, continue this theme of improved hygiene. Second, expand to a broader program that will allow us to address the “long tail” of causes contributing to slowness.
Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.