# DPOMetric

The DPOMetric tracks preference optimization-specific statistics during DPO training.

```python
from twinkle.metric import DPOMetric

metric = DPOMetric(device_mesh=..., process_group=...)

# Accumulate after each forward pass
metric.accumulate(inputs, outputs, ref_outputs=ref_outputs)

# Calculate aggregated metrics
result = metric.calculate()
```

**Tracked metrics:**
- `chosen_logps`: Average log-probability of chosen responses
- `rejected_logps`: Average log-probability of rejected responses
- `ref_chosen_logps`: Reference model log-probability of chosen responses
- `ref_rejected_logps`: Reference model log-probability of rejected responses
- `rewards/chosen`: Implicit reward for chosen responses
- `rewards/rejected`: Implicit reward for rejected responses
- `accuracy`: Fraction of pairs where chosen is preferred over rejected
- `margin`: Average reward margin between chosen and rejected

> DPOMetric performs DP-aware aggregation across all data-parallel ranks. It supports both interleaved and separate chosen/rejected batch formats.