DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 60
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Paper • 2504.05118 • Published Apr 7, 2025 • 26
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning Paper • 2502.03275 • Published Feb 5, 2025 • 18
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13, 2025 • 99
kenhktsui/open-react-retrieval-multi-neg-result-new-kw Viewer • Updated Aug 7, 2023 • 25.2k • 40 • 3
shenzhi-wang/Llama3-8B-Chinese-Chat Text Generation • 8B • Updated Jul 4, 2024 • 7.96k • • 685