# Cross Entropy Cross entropy is the most commonly used type of loss in model SFT and PT training. It is used for accurate probability fitting of labels. ```python class CrossEntropyLoss(Loss): def __init__(self, **kwargs): self.reduction = kwargs.get('reduction', 'mean') def __call__(self, inputs, outputs, **kwargs): import torch logits = outputs['logits'].view(-1, outputs['logits'].shape[-1]) labels = inputs['labels'].view(-1) return torch.nn.CrossEntropyLoss(reduction=self.reduction)(logits, labels) ``` The reduction parameter can be passed in during construction, supporting `sum`, `mean`, `none`, etc. (same as `torch.nn.CrossEntropyLoss` input). > Currently using `sum` in Transformers models. The purpose is to count the number of valid tokens before optimizer.step and take the average of single tokens at the grad level.