AI Benchmark Universe
GSM8k
Grade School Math with 8.5K problems
Mathematics
Reasoning
8,500 examples
GSM8k Details
A benchmark for mathematical reasoning, testing grade school math problems requiring multi-step reasoning.
Evaluation Type:
Mathematics, Reasoning
Example:
"John has 5 apples. He gives 2 to Mary and buys 3 more. How many does he have now?"
GPQA
General Purpose Question Answering
QA
Knowledge
Multi-domain
GPQA Details
A comprehensive benchmark testing general knowledge across multiple domains.
Evaluation Type:
Question Answering
Example:
"What is the capital of France?"