MultiLoraTransformersModel
This model inherits from TransformersModel. In addition to providing the same functions, it also provides the ability to run multiple loras in time-sharing, mainly used for multi-tenant training.
class MultiLoraTransformersModel:
def __init__(self, # noqa
model_cls = AutoModelForCausalLM,
model_id: Optional[str] = None,
config: Optional[PretrainedConfig] = None,
device_mesh: Optional[DeviceMesh] = None,
mixed_precision: Literal['no', 'fp8', 'fp16', 'bf16'] = 'bf16',
grad_scaler_config: Dict[str, Any] = None,
max_loras: int = 5,
max_r: int = 32,
max_length: int = 8192,
**kwargs):
...
...
In addition to the same parameters as the base class, this class provides several additional parameters for multi-lora configuration:
max_loras: Maximum number of loras
max_r: Maximum lora rank
max_length: Maximum supported training length
The reason for the existence of max_loras and max_r parameters is that Twinkle’s multi-lora technical solution is to add loras to max_loras before DDP wrap to prevent later added loras from being unable to accept DDP management.
Because of this, the user’s r must be less than or equal to the max_r configuration. During actual training, only part of the lora’s rank will be used in the calculation.
MultiLoraTransformersModel supports the @remote_class annotation and supports device_mesh, which means it can run in Ray workers.