Large language model performance matrix

Stack Serverless Security

This page describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.

Important

Excellent is the best rating, followed by Great, then by Good, and finally by Poor. Models rated Excellent or Great should produce quality results. Models rated Good or Poor are not recommended for that use case.

Proprietary models

Models from third-party LLM providers.

Feature	-	Assistant - General	Assistant - ES\|QL generation	Assistant - Alert questions	Assistant - Knowledge retrieval	Attack Discovery	Automatic Migration
Model	Claude Opus 4	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent
	Claude Sonnet 4	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent
	Claude Sonnet 3.7	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent
	GPT-4.1	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent
	Gemini 2.0 Flash 001	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent
	Gemini 2.5 Pro	Excellent	Excellent	Excellent	Excellent	Excellent	Excellent

Open-source models

Models you can deploy yourself.

Feature	-	Assistant - General	Assistant - ES\|QL generation	Assistant - Alert questions	Assistant - Knowledge retrieval	Attack Discovery	Automatic Migration
Model	Mistral‑Small‑3.2‑24B‑Instruct‑2506	Excellent	Good	Excellent	Excellent	Good	N/A
	Mistral-Small-3.1-24B-Instruct-2503	Excellent	Good	Excellent	Excellent	Good	N/A
	Mistral Nemo	Good	Good	Great	Good	Poor	Poor
	LLama 3.2	Good	Poor	Good	Poor	Poor	Good
	LLama 3.1 405b	Good	Great	Good	Good	Poor	Poor
	LLama 3.1 70b	Good	Good	Poor	Poor	Poor	Good