Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper โข 2504.20571 โข Published Apr 29 โข 98 โข 15