Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? Paper • 2503.18018 • Published Mar 23, 2025 • 7
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 190
Reasoning Models Struggle to Control their Chains of Thought Paper • 2603.05706 • Published about 1 month ago • 36
FrenchBench Evaluation datasets Collection These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 8