EmbeddingMetric
The EmbeddingMetric tracks embedding quality during contrastive (InfoNCE) training. It reports anchor-positive cosine similarity statistics and in-batch negative similarity.
Usage
from twinkle.metric import EmbeddingMetric
metric = EmbeddingMetric(device_mesh=device_mesh, process_group=process_group)
# During training
metric.accumulate(inputs, outputs)
# At log interval
results = metric.calculate()
# results: {
# 'pos_sim': '0.8523', # Mean anchor-positive cosine similarity
# 'pos_sim_min': '0.7102', # Min across batch
# 'pos_sim_max': '0.9451', # Max across batch
# 'neg_sim': '0.2134', # Mean anchor-negative similarity
# 'loss': '0.3412', # Average InfoNCE loss
# 'grad_norm': '1.234567', # Gradient norm
# }
Reported Metrics
| Metric | Description |
|---|---|
pos_sim |
Mean cosine similarity between anchors and their positives |
pos_sim_min |
Minimum anchor-positive similarity in the batch |
pos_sim_max |
Maximum anchor-positive similarity in the batch |
neg_sim |
Mean similarity between anchors and other positives (in-batch negatives) |
loss |
Average contrastive loss value |
grad_norm |
Gradient norm (passed via kwargs) |
Cross-Rank Gathering
EmbeddingMetric performs an all_gather to compute similarity statistics across all DP ranks, providing a global view of embedding quality even under data-parallel training.
This metric pairs with
InfonceLossfor embedding/retrieval training tasks.