This is a cache of https://www.elastic.co/docs/solutions/security/ai/large-language-model-performance-matrix. It is a snapshot of the page at 2025-08-02T01:06:20.291+0000.
Large language model performance matrix | Elastic Docs
Loading

Large language model performance matrix

Stack Serverless Security

This page describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.

Important

Excellent is the best rating, followed by Great, then by Good, and finally by Poor. Models rated Excellent or Great should produce quality results. Models rated Good or Poor are not recommended for that use case.

Models from third-party LLM providers.

Feature - Assistant - General Assistant - ES|QL generation Assistant - Alert questions Assistant - Knowledge retrieval Attack Discovery Automatic Migration
Model Claude Opus 4 Excellent Excellent Excellent Excellent Excellent Excellent
Claude Sonnet 4 Excellent Excellent Excellent Excellent Excellent Excellent
Claude Sonnet 3.7 Excellent Excellent Excellent Excellent Excellent Excellent
GPT-4.1 Excellent Excellent Excellent Excellent Excellent Excellent
Gemini 2.0 Flash 001 Excellent Excellent Excellent Excellent Excellent Excellent
Gemini 2.5 Pro Excellent Excellent Excellent Excellent Excellent Excellent

Models you can deploy yourself.

Feature - Assistant - General Assistant - ES|QL generation Assistant - Alert questions Assistant - Knowledge retrieval Attack Discovery Automatic Migration
Model Mistral‑Small‑3.2‑24B‑Instruct‑2506 Excellent Good Excellent Excellent Good N/A
Mistral-Small-3.1-24B-Instruct-2503 Excellent Good Excellent Excellent Good N/A
Mistral Nemo Good Good Great Good Poor Poor
LLama 3.2 Good Poor Good Poor Poor Good
LLama 3.1 405b Good Great Good Good Poor Poor
LLama 3.1 70b Good Good Poor Poor Poor Good