# InfoNCE Loss The `InfonceLoss` implements contrastive learning with in-batch negatives and optional cross-rank gathering. It is designed for embedding/retrieval model training. ## Usage ```python from twinkle.loss import InfonceLoss loss_fn = InfonceLoss( temperature=0.1, use_batch=True, # Enable in-batch negatives hard_negatives=7, # Fix negative count per sample mask_fake_negative=True, # Mask false negatives fake_neg_margin=0.1, # Margin for false negative detection ) model.set_loss(loss_fn) ``` ## Input Format Each sample is laid out as `anchor(1) + positive(1) + negatives(n)` in a flat embedding tensor. The `inputs['labels']` is a 1-D mask where `1` marks the start of each group. ``` embeddings: [a0, p0, n0_1, n0_2, a1, p1, n1_1, n1_2, ...] labels: [ 1, 0, 0, 0, 1, 0, 0, 0, ...] ``` ## Parameters | Parameter | Type | Default | Description | |:----------|:-----|:--------|:------------| | `temperature` | float | 0.1 | Logit scaling factor | | `use_batch` | bool | True | Use cross-sample in-batch negatives | | `hard_negatives` | int | None | Fix per-sample negative count (truncate/upsample) | | `mask_fake_negative` | bool | False | Mask logits > positive + margin | | `fake_neg_margin` | float | 0.1 | Threshold for false negative masking | | `include_qq` | bool | False | Add query-query similarity block | | `include_dd` | bool | False | Add doc-doc similarity block | ## Cross-Rank Gathering When `use_batch=True` and distributed training is active, embeddings are gathered from all DP ranks to maximize in-batch negative diversity. Only the local shard retains gradients. ## Similarity Blocks The loss supports three similarity blocks for comprehensive contrastive learning: - **Q→D (default)**: Query to all documents — primary contrastive signal - **Q→Q** (`include_qq=True`): Query to all other queries — prevents query collapse - **D→D** (`include_dd=True`): Document to all other documents — Qwen3-Embedding style ## Example: Embedding Training ```python from twinkle.loss import InfonceLoss from twinkle.metric import EmbeddingMetric # Configure model for embedding model.set_loss(InfonceLoss(temperature=0.05, use_batch=True, include_qq=True)) model.set_metric(EmbeddingMetric(device_mesh=mesh, process_group=pg)) # Training loop for batch in dataloader: model.forward_backward(batch) model.clip_grad_and_step() ```