# Multi-Turn Rollout The Rollout module provides multi-turn conversation rollout engines for agentic RLHF training. Two implementations are available: `MultiTurnRollout` for batched vLLM sampling and `APIMultiTurnRollout` for OpenAI-compatible API endpoints. ## Rollout Base Class ```python from abc import ABC, abstractmethod from twinkle.data_format import Trajectory class Rollout(ABC): @abstractmethod def __call__(self, trajectories: List[Trajectory], **kwargs) -> List[Trajectory]: raise NotImplementedError() ``` All rollouts accept a list of trajectories and return the same number of trajectories with additional fields (`messages`, `turns`, `stop_reason`, `truncated`). ## MultiTurnRollout Batched multi-turn rollout engine that uses a vLLM sampler for generation. All active trajectories are sampled in a single batched call per turn for maximum throughput. ### Per-turn Loop 1. Encode each trajectory into an `InputFeature` with a generation prompt 2. Batch `sampler.sample(active_pifs)` — all live trajectories in parallel 3. Check termination: `stop_reason == 'length'`, no tool calls, or max turns reached 4. Dispatch tools via `ToolManager`, append tool responses 5. Compute bridge tokens (tool turns + generation prompt) with `labels = -100` 6. Repeat until all trajectories are done ```python from twinkle_agentic.rollout.multi_turn import MultiTurnRollout from twinkle_agentic.tools.tool_manager import ToolManager from twinkle.data_format.sampling import SamplingParams rollout = MultiTurnRollout( sampler=vllm_sampler, template=template, tool_manager=tool_manager, sampling_params=SamplingParams(temperature=0.7, max_tokens=4096), max_turns=6, max_trajectory_tokens=8192, trace_dir='rollout_traces/', ) # Run rollout results = rollout(trajectories) ``` ### Parameters | Parameter | Type | Description | |-----------|------|-------------| | `sampler` | Sampler | vLLM sampler instance for batched generation. | | `template` | `Template` | Chat template for encoding/decoding. | | `tool_manager` | `ToolManager` | Tool dispatcher. Can also be passed per-call. | | `sampling_params` | `SamplingParams` | Default sampling parameters. | | `max_turns` | `int` | Maximum number of turns per trajectory (default: 6). | | `max_trajectory_tokens` | `int` | Max total token length; exceeding truncates the trajectory. | | `trace_dir` | `str` | Directory for per-trajectory JSON trace dumps. | | `trace_callback` | `Callable` | Decides whether to store a trajectory trace. | | `success_callback` | `Callable` | Decides filename prefix (`ok-` vs `fail-`). | ### Output Fields Each output trajectory dict includes: | Field | Type | Description | |-------|------|-------------| | `messages` | `List[Dict]` | Full conversation including tool turns. | | `input_ids` | `List[int]` | Token IDs of the full sequence. | | `labels` | `List[int]` | Training labels (`-100` for non-trainable tokens). | | `turns` | `int` | Number of turns performed. | | `stop_reason` | `str` | `'stop'` / `'length'` | | `truncated` | `bool` | Whether the trajectory was truncated. | | `logprobs` | `List` | Per-token log probabilities (if available). | ### Ray Remote Support `MultiTurnRollout` is decorated with `@remote_class()`, enabling transparent deployment as a Ray actor: ```python # The rollout can run as a Ray remote actor rollout_actor = MultiTurnRollout.remote(sampler=sampler, template=template, ...) results = ray.get(rollout_actor.__call__.remote(trajectories)) ``` ## APIMultiTurnRollout Multi-turn rollout over an OpenAI-compatible chat-completions API. Each trajectory runs independently in a thread pool for network concurrency. ```python from twinkle_agentic.rollout.api_multi_turn import APIMultiTurnRollout from twinkle_agentic.protocol.openai import OpenAI api = OpenAI(model='qwen3.5-32b', base_url='http://localhost:8000/v1') rollout = APIMultiTurnRollout( api=api, tool_manager=tool_manager, sampling_params=SamplingParams(temperature=0.7), max_turns=6, concurrency=8, trace_dir='api_traces/', ) results = rollout(trajectories) ``` ### Parameters | Parameter | Type | Description | |-----------|------|-------------| | `api` | `OpenAI` | OpenAI-compatible API client. | | `tool_manager` | `ToolManager` | Tool dispatcher (single or per-trajectory list). | | `sampling_params` | `SamplingParams` | Default sampling parameters. | | `max_turns` | `int` | Maximum turns per trajectory (default: 6). | | `concurrency` | `int` | Thread pool size for parallel API calls (default: 8). | | `extra_body` | `Dict` | Extra fields to include in API requests. | | `trace_dir` | `str` | Directory for trace dumps. | ### Stop Reasons | Reason | Description | |--------|-------------| | `stop` | Assistant responded without tool calls (natural end). | | `length` | API returned `finish_reason='length'` (token limit). | | `max_turns` | Reached `max_turns` limit. | | `api_error` | API call or tool execution raised an exception. | ## Choosing Between Rollouts | Feature | MultiTurnRollout | APIMultiTurnRollout | |---------|-----------------|---------------------| | **Backend** | vLLM sampler (local GPU) | OpenAI-compatible API | | **Training integration** | Produces `input_ids` / `labels` for GRPO | Messages only (for data collection) | | **Batching** | GPU-level batch parallelism | Network-level thread concurrency | | **Use case** | Online RLHF training loop | Offline data generation / evaluation |