GSM8K Reward
Reward functions specifically designed for evaluating GSM8K math problem solutions.
GSM8KAccuracyReward
Evaluates the correctness of GSM8K answers by extracting boxed or hash-formatted (####) answers and performing numeric/string comparison.
from twinkle.reward import GSM8KAccuracyReward
reward_fn = GSM8KAccuracyReward()
rewards = reward_fn(generated_trajectories, ground_truth_trajectories)
# rewards: List[float], 1.0 for correct, 0.0 for incorrect
The reward function:
Extracts the answer from
\boxed{...}or#### ...format in the model’s completionExtracts the ground truth answer from the reference trajectory
Performs numeric comparison (with tolerance) or exact string matching
GSM8KFormatReward
Checks whether the model output contains a properly formatted answer section.
from twinkle.reward import GSM8KFormatReward
reward_fn = GSM8KFormatReward()
rewards = reward_fn(trajectories, ground_truths)
# rewards: List[float], 1.0 if format is valid, 0.0 otherwise
Use GSM8KAccuracyReward and GSM8KFormatReward together as a composite reward for GRPO training on math problem solving tasks.