CosineWarmupScheduler

This LRScheduler is used to warm up the learning rate at the beginning of training and decay the learning rate after reaching the specified learning rate.

class CosineWarmupScheduler:

    def __init__(self, optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5):
        ...

    ...

Construction parameters:

  • optimizer: optimizer instance

  • num_warmup_steps: Number of warmup steps

  • num_training_steps: Total training steps

  • num_cycles: Cosine curve period, default 0.5 for half a cosine period, which decays from the maximum learning rate to the minimum. Adjusting to 1 will decay from the maximum learning rate to the minimum and back to the maximum.

These parameters can be set through the model’s set_lr_scheduler:

model.set_lr_scheduler(CosineWarmupScheduler, num_warmup_steps=10, num_training_steps=100, num_cycles=0.5)

The optimizer parameter does not need to be passed in; the model module will automatically add it internally.

Megatron models do not support this Scheduler.