DPOMetric
The DPOMetric tracks preference optimization-specific statistics during DPO training.
from twinkle.metric import DPOMetric
metric = DPOMetric(device_mesh=..., process_group=...)
# Accumulate after each forward pass
metric.accumulate(inputs, outputs, ref_outputs=ref_outputs)
# Calculate aggregated metrics
result = metric.calculate()
Tracked metrics:
chosen_logps: Average log-probability of chosen responsesrejected_logps: Average log-probability of rejected responsesref_chosen_logps: Reference model log-probability of chosen responsesref_rejected_logps: Reference model log-probability of rejected responsesrewards/chosen: Implicit reward for chosen responsesrewards/rejected: Implicit reward for rejected responsesaccuracy: Fraction of pairs where chosen is preferred over rejectedmargin: Average reward margin between chosen and rejected
DPOMetric performs DP-aware aggregation across all data-parallel ranks. It supports both interleaved and separate chosen/rejected batch formats.