CompletionRewardMetric
The CompletionRewardMetric aggregates key statistics during RLHF training, including generation time, weight synchronization time, reward scores, and completion lengths.
from twinkle.metric import CompletionRewardMetric
metric = CompletionRewardMetric(device_mesh=..., process_group=...)
# Accumulate during training loop
metric.accumulate(
inputs,
outputs,
generation_time=gen_time,
weight_sync_time=sync_time,
rewards=reward_values,
completions=completion_texts,
)
# Calculate aggregated metrics
result = metric.calculate()
# result contains: generation_time, weight_sync_time, mean_reward, mean_completion_length, etc.
This metric is designed for GRPO and other RL training loops where monitoring generation quality and system performance is essential.
CompletionRewardMetric performs DP-aware aggregation, correctly averaging metrics across all data-parallel ranks.