MultiModal Reward
Reward function for evaluating multimodal visual question answering (VQA) tasks.
MultiModalAccuracyReward
Evaluates the correctness of multimodal VQA answers with a fallback to symbolic math verification.
from twinkle.reward import MultiModalAccuracyReward
reward_fn = MultiModalAccuracyReward()
rewards = reward_fn(generated_trajectories, ground_truth_trajectories)
# rewards: List[float], 1.0 for correct, 0.0 for incorrect
The reward function:
Extracts the model’s answer from the completion text
Compares with ground truth using exact string matching
Falls back to
math_verifyfor symbolic expression comparison when string matching fails
Designed for visual reasoning tasks such as CLEVR, where answers may be numeric, boolean, or short text.