OlympiadBench Reward
A family of reward functions for evaluating OlympiadBench math and physics competition problems.
OlympiadBenchAccuracyReward
Evaluates answer correctness with support for LaTeX normalization, numeric tolerance, and partial matching.
from twinkle.reward import OlympiadBenchAccuracyReward
reward_fn = OlympiadBenchAccuracyReward()
rewards = reward_fn(generated_trajectories, ground_truth_trajectories)
# rewards: List[float], 1.0 for correct, 0.0 for incorrect
The reward function:
Extracts boxed answers from
\boxed{...}with nested brace handlingNormalizes both prediction and ground truth (LaTeX, units, fractions)
Compares via normalized string matching or numeric comparison with tolerance
OlympiadBenchFormatReward
Validates the structural format of model outputs.
from twinkle.reward import OlympiadBenchFormatReward
reward_fn = OlympiadBenchFormatReward()
rewards = reward_fn(trajectories, ground_truths)
# rewards: List[float], scores based on format quality
Scoring criteria:
Presence of
\boxed{...}answerAnswer positioning (should appear near the end)
Answer uniqueness and consistency
OlympiadBenchQualityReward
A composite quality reward combining multiple aspects of response quality.
from twinkle.reward import OlympiadBenchQualityReward
reward_fn = OlympiadBenchQualityReward()
rewards = reward_fn(trajectories, ground_truths)
Quality dimensions:
Reasoning structure: Detection of step-by-step reasoning patterns
Length appropriateness: Smooth penalty curve for responses that are too short or too long
Content uniqueness: Penalizes repetitive content within the response
These rewards can be used individually or combined as a composite reward for GRPO training on olympiad-level math and physics problems.