WALAR - a lyf07 Collection

lyf07 's Collections

WALAR

updated 4 days ago

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

lyf07/LLaMAX3-8B-Alpaca-WALAR

Translation • 8B • Updated about 1 hour ago • 50
lyf07/Qwen3-8B-WALAR

Translation • 8B • Updated about 1 hour ago • 61
lyf07/Translategemma-4B-it-WALAR

Translation • 769k • Updated about 1 hour ago • 52
Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Paper • 2603.13045 • Published 8 days ago • 1