# GSM8K Reward Reward functions specifically designed for evaluating GSM8K math problem solutions. ## GSM8KAccuracyReward Evaluates the correctness of GSM8K answers by extracting boxed or hash-formatted (`####`) answers and performing numeric/string comparison. ```python from twinkle.reward import GSM8KAccuracyReward reward_fn = GSM8KAccuracyReward() rewards = reward_fn(generated_trajectories, ground_truth_trajectories) # rewards: List[float], 1.0 for correct, 0.0 for incorrect ``` The reward function: 1. Extracts the answer from `\boxed{...}` or `#### ...` format in the model's completion 2. Extracts the ground truth answer from the reference trajectory 3. Performs numeric comparison (with tolerance) or exact string matching ## GSM8KFormatReward Checks whether the model output contains a properly formatted answer section. ```python from twinkle.reward import GSM8KFormatReward reward_fn = GSM8KFormatReward() rewards = reward_fn(trajectories, ground_truths) # rewards: List[float], 1.0 if format is valid, 0.0 otherwise ``` > Use GSM8KAccuracyReward and GSM8KFormatReward together as a composite reward for GRPO training on math problem solving tasks.