LLM Benchmarks

Detailed information about the benchmarks used to evaluate language models in our leaderboard.

Last updated: March 25, 2025

MMLU

Multi-task language understanding benchmark focused on evaluating models' general knowledge and reasoning abilities across a wide range of academic subjects

Click to view details

MMLU Pro

Academic benchmark for evaluating language understanding models. Similar to MMLU, it falls under multi-task language understanding but with greater emphasis on more challenging and reasoning-based tasks

Click to view details

MMMU

Multimodal understanding and reasoning benchmark for expert general AI, covering disciplines such as art & design, business, science, health & medicine, humanities & social sciences, and technology & engineering

Click to view details

HellaSwag

Common sense natural language inference benchmark focused on sentence completion and assessing models' ability to understand context and reason about everyday situations

Click to view details

HumanEval

Code generation benchmark focused on evaluating language models' ability to generate functionally correct code from docstrings

Click to view details

MATH

Mathematical word problem solving benchmark focused on evaluating models' mathematical reasoning and problem-solving abilities at competition level

Click to view details

MATH500

Mathematical reasoning benchmark focused on evaluating AI models' ability to solve high school level math problems requiring logical reasoning

Click to view details

GPQA

Graduate-level question answering benchmark focused on evaluating models' general question answering performance in STEM fields requiring deep understanding and reasoning

Click to view details

GPQA Diamond

Graduate-level question answering benchmark, more specifically, a more challenging subset focused on high-confidence expert-verified questions

Click to view details

IFEval

Instruction-following benchmark for large language models focused on evaluating models' ability to follow specific natural language instructions

Click to view details