MaziyarPanahi

calme-3.2-instruct-78b

fine-tunedondomain-specificdatasetsQwen2ForCausalLMbfloat16

Install and run this model locally using llmpm, the open-source LLM package manager.

Install
llmpm install MaziyarPanahi/calme-3.2-instruct-78b
Run
llmpm run MaziyarPanahi/calme-3.2-instruct-78b
Average Score (0–100)
52.1%
Weighted average of normalized scores from all benchmarks. Each benchmark is normalized to a 0–100 scale, then averaged together.

BENCHMARK SCORES

IFEval80.6%

Instruction-Following Evaluation. Tests the model's ability to follow explicit formatting instructions (instruction following, formatting, generation). Scored by strict format accuracy.

BBH62.6%

Big Bench Hard. A collection of challenging tasks across language understanding, mathematical reasoning, and common sense knowledge. Scored by accuracy on multiple-choice questions.

MATH Lvl 540.3%

Mathematics Aptitude Test of Heuristics, Level 5. High school competition problems covering complex algebra, geometry, and advanced calculus. Scored by exact match.

GPQA20.4%

Graduate-Level Google-Proof Q&A. PhD-level multiple-choice questions in chemistry, biology, and physics. Scored by accuracy.

MuSR38.5%

Multistep Soft Reasoning. Tests reasoning and understanding over long texts, including language understanding, reasoning capabilities, and long-context reasoning. Scored by accuracy.

MMLU-Pro70.0%

Massive Multitask Language Understanding – Professional. Expert-reviewed multiple-choice questions across medicine, law, engineering, and mathematics. Scored by accuracy.

MODEL INFO

Architecture
Qwen2ForCausalLM
Precision
bfloat16
Type
fine-tunedondomain-specificdatasets
Weight Type
Original
Parameters
78.0B
Chat Template
Yes

METADATA

Upload Date
2024-11-19
Submission Date
2024-11-28
License
other
Base Model
Removed
HF Hearts
112
CO₂ Cost (kg)
66.01