Qwen
Qwen2.5-72B-Instruct
Install and run this model locally using llmpm, the open-source LLM package manager.
llmpm install Qwen/Qwen2.5-72B-Instructllmpm run Qwen/Qwen2.5-72B-InstructBENCHMARK SCORES
Instruction-Following Evaluation. Tests the model's ability to follow explicit formatting instructions (instruction following, formatting, generation). Scored by strict format accuracy.
Big Bench Hard. A collection of challenging tasks across language understanding, mathematical reasoning, and common sense knowledge. Scored by accuracy on multiple-choice questions.
Mathematics Aptitude Test of Heuristics, Level 5. High school competition problems covering complex algebra, geometry, and advanced calculus. Scored by exact match.
Graduate-Level Google-Proof Q&A. PhD-level multiple-choice questions in chemistry, biology, and physics. Scored by accuracy.
Multistep Soft Reasoning. Tests reasoning and understanding over long texts, including language understanding, reasoning capabilities, and long-context reasoning. Scored by accuracy.
Massive Multitask Language Understanding – Professional. Expert-reviewed multiple-choice questions across medicine, law, engineering, and mathematics. Scored by accuracy.