# Chunked Cross Entropy

A memory-efficient variant of cross-entropy loss that processes the vocabulary dimension in chunks to reduce peak GPU memory usage.

```python
from twinkle.loss import ChunkedCrossEntropyLoss

loss_fn = ChunkedCrossEntropyLoss(
    chunk_size=1024,           # vocabulary chunk size
    reduction='mean',
)

model.set_loss(loss_fn)
```

**Parameters:**
- `chunk_size`: Number of vocabulary tokens to process per chunk (default: 1024)
- `reduction`: Reduction mode — `sum`, `mean`, or `none`

The implementation uses a custom autograd function that splits the logit-to-loss computation into chunks along the vocabulary dimension. This avoids materializing the full `[batch*seq_len, vocab_size]` probability tensor, significantly reducing memory for large vocabularies.

> Useful when training with large vocabulary models where standard cross-entropy causes OOM errors.