OlympiadBench Reward

A family of reward functions for evaluating OlympiadBench math and physics competition problems.

OlympiadBenchAccuracyReward

Evaluates answer correctness with support for LaTeX normalization, numeric tolerance, and partial matching.

from twinkle.reward import OlympiadBenchAccuracyReward

reward_fn = OlympiadBenchAccuracyReward()
rewards = reward_fn(generated_trajectories, ground_truth_trajectories)
# rewards: List[float], 1.0 for correct, 0.0 for incorrect

The reward function:

  1. Extracts boxed answers from \boxed{...} with nested brace handling

  2. Normalizes both prediction and ground truth (LaTeX, units, fractions)

  3. Compares via normalized string matching or numeric comparison with tolerance

OlympiadBenchFormatReward

Validates the structural format of model outputs.

from twinkle.reward import OlympiadBenchFormatReward

reward_fn = OlympiadBenchFormatReward()
rewards = reward_fn(trajectories, ground_truths)
# rewards: List[float], scores based on format quality

Scoring criteria:

  • Presence of \boxed{...} answer

  • Answer positioning (should appear near the end)

  • Answer uniqueness and consistency

OlympiadBenchQualityReward

A composite quality reward combining multiple aspects of response quality.

from twinkle.reward import OlympiadBenchQualityReward

reward_fn = OlympiadBenchQualityReward()
rewards = reward_fn(trajectories, ground_truths)

Quality dimensions:

  • Reasoning structure: Detection of step-by-step reasoning patterns

  • Length appropriateness: Smooth penalty curve for responses that are too short or too long

  • Content uniqueness: Penalizes repetitive content within the response

These rewards can be used individually or combined as a composite reward for GRPO training on olympiad-level math and physics problems.