Reward Models 06-2025 Collection Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated about 4 hours ago • 23
OLMo-1B-as_fm3_tg_omi2 Collection OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, and OpenMathInstruct2. Includes checkpoints from doing PPO using GSM8K train. • 25 items • Updated Jan 26 • 1
OLMo-1B-as_fm3_tg_omi1_omi2 Collection OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, OMI1, and OMI2. Includes checkpoints from doing PPO using GSM8K train. • 25 items • Updated Jan 26 • 2