AI Benchmark Universe

GSM8k

Grade School Math with 8.5K problems

Mathematics Reasoning
8,500 examples

GSM8k Details

A benchmark for mathematical reasoning, testing grade school math problems requiring multi-step reasoning.

Evaluation Type:

Mathematics, Reasoning

Example:

"John has 5 apples. He gives 2 to Mary and buys 3 more. How many does he have now?"

View Full Details

GPQA

General Purpose Question Answering

QA Knowledge
Multi-domain

GPQA Details

A comprehensive benchmark testing general knowledge across multiple domains.

Evaluation Type:

Question Answering

Example:

"What is the capital of France?"

View Full Details

MMLU

Massive Multitask Language Understanding

ML Multitask
57 subjects

MMLU Details

Tests understanding across 57 subjects including STEM, humanities and more.

Evaluation Type:

Multitask Understanding

Example:

"Explain the concept of entropy in thermodynamics."