TRL documentation
Reward Functions
Getting started
Conceptual Guides
How-to guides
Command Line Interface (CLI)Customizing the TrainingReducing Memory UsageSpeeding Up TrainingDistributing TrainingUsing Trained Models
Integrations
Examples
Example OverviewCommunity TutorialsSentiment TuningTraining StackLlamaDetoxifying a Language ModelMulti Adapter RLHFFine-tuning a Multimodal Model Using SFT (Single or Multi-Image Dataset)
API
You are viewing v0.20.0 version. A newer version v1.5.1 is available.
Reward Functions
This module contains some useful reward functions, primarily intended for use with the GRPOTrainer.
Format rewards
think_format_reward
trl.rewards.think_format_reward
< source >( completions: list **kwargs ) → list[float]
Parameters
- completions (
list[list[dict[str, str]]]) — List of completions to be evaluated. Each completion must be a list of one message, i.e. a dictionary containing the key"content"with the value being the text of the completion. - **kwargs — Additional keyword arguments. This function does not use them, but they are required in the function signature to ensure compatibility with trainers like GRPOTrainer.
Returns
list[float]
A list of rewards, where each reward is 1.0 if the completion matches the expected format, otherwise 0.0.
Reward function that checks if the reasoning process is enclosed within "<think>" and "</think>" tags. The
function returns a reward of 1.0 if the format is correct, otherwise 0.0.