🧙 Guru Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17, 2025 • 50 LLM360/guru-RL-92k Viewer • Updated Aug 20, 2025 • 91.9k • 1.34k • 45 LLM360/guru-7B Text Generation • 8B • Updated Jun 19, 2025 • 5 • • 3 LLM360/guru-32B Text Generation • 33B • Updated Jun 19, 2025 • 8 • 2
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17, 2025 • 50
🐙 OctoThinker Mid-training Incentivizes Reinforcement Learning Scaling OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25, 2025 • 47 OctoThinker/MegaMath-Web-Pro-Max Viewer • Updated Jul 6, 2025 • 69.2M • 3.29k • 36 OctoThinker/OctoThinker-8B-Long-Base Text Generation • 8B • Updated Jul 6, 2025 • 5 OctoThinker/OctoThinker-8B-Hybrid-Base Text Generation • 8B • Updated Jul 6, 2025 • 507 • 2
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25, 2025 • 47
🧙 Guru Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17, 2025 • 50 LLM360/guru-RL-92k Viewer • Updated Aug 20, 2025 • 91.9k • 1.34k • 45 LLM360/guru-7B Text Generation • 8B • Updated Jun 19, 2025 • 5 • • 3 LLM360/guru-32B Text Generation • 33B • Updated Jun 19, 2025 • 8 • 2
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17, 2025 • 50
🐙 OctoThinker Mid-training Incentivizes Reinforcement Learning Scaling OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25, 2025 • 47 OctoThinker/MegaMath-Web-Pro-Max Viewer • Updated Jul 6, 2025 • 69.2M • 3.29k • 36 OctoThinker/OctoThinker-8B-Long-Base Text Generation • 8B • Updated Jul 6, 2025 • 5 OctoThinker/OctoThinker-8B-Hybrid-Base Text Generation • 8B • Updated Jul 6, 2025 • 507 • 2
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25, 2025 • 47