# OlympiadBench Reward

A family of reward functions for evaluating OlympiadBench math and physics competition problems.

## OlympiadBenchAccuracyReward

Evaluates answer correctness with support for LaTeX normalization, numeric tolerance, and partial matching.

```python
from twinkle.reward import OlympiadBenchAccuracyReward

reward_fn = OlympiadBenchAccuracyReward()
rewards = reward_fn(generated_trajectories, ground_truth_trajectories)
# rewards: List[float], 1.0 for correct, 0.0 for incorrect
```

The reward function:
1. Extracts boxed answers from `\boxed{...}` with nested brace handling
2. Normalizes both prediction and ground truth (LaTeX, units, fractions)
3. Compares via normalized string matching or numeric comparison with tolerance

## OlympiadBenchFormatReward

Validates the structural format of model outputs.

```python
from twinkle.reward import OlympiadBenchFormatReward

reward_fn = OlympiadBenchFormatReward()
rewards = reward_fn(trajectories, ground_truths)
# rewards: List[float], scores based on format quality
```

Scoring criteria:
- Presence of `\boxed{...}` answer
- Answer positioning (should appear near the end)
- Answer uniqueness and consistency

## OlympiadBenchQualityReward

A composite quality reward combining multiple aspects of response quality.

```python
from twinkle.reward import OlympiadBenchQualityReward

reward_fn = OlympiadBenchQualityReward()
rewards = reward_fn(trajectories, ground_truths)
```

Quality dimensions:
- **Reasoning structure**: Detection of step-by-step reasoning patterns
- **Length appropriateness**: Smooth penalty curve for responses that are too short or too long
- **Content uniqueness**: Penalizes repetitive content within the response

> These rewards can be used individually or combined as a composite reward for GRPO training on olympiad-level math and physics problems.