CosineWarmupScheduler
This LRScheduler is used to warm up the learning rate at the beginning of training and decay the learning rate after reaching the specified learning rate.
class CosineWarmupScheduler:
def __init__(self, optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5):
...
...
Construction parameters:
optimizer: optimizer instance
num_warmup_steps: Number of warmup steps
num_training_steps: Total training steps
num_cycles: Cosine curve period, default 0.5 for half a cosine period, which decays from the maximum learning rate to the minimum. Adjusting to 1 will decay from the maximum learning rate to the minimum and back to the maximum.
These parameters can be set through the model’s set_lr_scheduler:
model.set_lr_scheduler(CosineWarmupScheduler, num_warmup_steps=10, num_training_steps=100, num_cycles=0.5)
The optimizer parameter does not need to be passed in; the model module will automatically add it internally.
Megatron models do not support this Scheduler.