VADE-Models FloSophorae/Qwen2.5VL-3B-Instruct-VADE-GRPO 4B • Updated Nov 24, 2025 • 4 FloSophorae/Qwen2.5VL-7B-Instruct-VADE-GRPO 8B • Updated Nov 24, 2025 • 4 FloSophorae/Qwen2.5VL-3B-Instruct-VADE-GSPO 4B • Updated Nov 24, 2025 • 3 FloSophorae/Qwen2.5VL-7B-Instruct-VADE-GSPO 8B • Updated Nov 24, 2025 • 4
VLM-RL Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper • 2509.01106 • Published Sep 1, 2025 • 51 R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28, 2025 • 110
Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper • 2509.01106 • Published Sep 1, 2025 • 51
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28, 2025 • 110
Agentic-RL The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
VADE-Models FloSophorae/Qwen2.5VL-3B-Instruct-VADE-GRPO 4B • Updated Nov 24, 2025 • 4 FloSophorae/Qwen2.5VL-7B-Instruct-VADE-GRPO 8B • Updated Nov 24, 2025 • 4 FloSophorae/Qwen2.5VL-3B-Instruct-VADE-GSPO 4B • Updated Nov 24, 2025 • 3 FloSophorae/Qwen2.5VL-7B-Instruct-VADE-GSPO 8B • Updated Nov 24, 2025 • 4
Agentic-RL The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
VLM-RL Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper • 2509.01106 • Published Sep 1, 2025 • 51 R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28, 2025 • 110
Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper • 2509.01106 • Published Sep 1, 2025 • 51
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28, 2025 • 110