Sampler
Sampler is a component in Twinkle for generating model outputs, primarily used for sample generation in RLHF training. The sampler supports multiple inference engines, including vLLM and native PyTorch.
Basic Interface
class Sampler(ABC):
@abstractmethod
def sample(
self,
inputs: Union[InputFeature, List[InputFeature], Trajectory, List[Trajectory]],
sampling_params: Optional[SamplingParams] = None,
adapter_name: str = '',
*,
num_samples: int = 1,
) -> List[SampleResponse]:
"""Sample from given inputs"""
...
def add_adapter_to_model(self, adapter_name: str, config_or_dir, **kwargs):
"""Add LoRA adapter"""
...
def set_template(self, template_cls: Union[Template, Type[Template], str], **kwargs):
"""Set template"""
...
The core method of the sampler is sample, which accepts input data and returns generated samples.
Available Samplers
Twinkle provides two sampler implementations:
vLLMSampler
vLLMSampler uses the vLLM engine for efficient inference, supporting high-throughput batch sampling.
High Performance: Uses PagedAttention and continuous batching
LoRA Support: Supports dynamic loading and switching of LoRA adapters
Multi-Sample Generation: Can generate multiple samples per prompt
Tensor Parallel: Supports tensor parallelism to accelerate large model inference
See: vLLMSampler
TorchSampler
TorchSampler uses native PyTorch and transformers for inference, suitable for small-scale sampling or debugging.
Easy to Use: Based on transformers’ standard interface
High Flexibility: Easy to customize and extend
Low Memory Footprint: Suitable for small-scale sampling
See: TorchSampler
How to Choose
vLLMSampler: Suitable for production environments and large-scale training that require high throughput
TorchSampler: Suitable for debugging, small-scale experiments, or custom requirements
In RLHF training, samplers are typically separated from the Actor model, using different hardware resources to avoid interference between inference and training.