SAGE Collection Self-Hinting Language Models Enhance Reinforcement Learning • 23 items • Updated Mar 27 • 3
Self-Hinting Language Models Enhance Reinforcement Learning Paper • 2602.03143 • Published Feb 3 • 31
3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability Paper • 2409.00119 • Published Aug 28, 2024 • 1
Reinforce-Ada Collection Training & test sets and finetuned models • 19 items • Updated Oct 26, 2025 • 3
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training Paper • 2510.04996 • Published Oct 6, 2025 • 16
view article Article Gaia2 and ARE: Empowering the community to study agents +9 clefourrier, gregmialz, mlcu, mortimerp9, XciD, tfrere, evijit, RomainFroger, dheeraj7596, CarolinePascal, upiter • Sep 22, 2025 • 134
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation Paper • 2505.06027 • Published May 9, 2025 • 18
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment NormalUhr • Feb 11, 2025 • 121
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31, 2025 • 39