Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward
Paper
•
2506.07218
•
Published
Paper: arxiv.org/abs/2506.07218
Please refer to GitHub repo for detailed usage: https://github.com/tongxiao2002/Perception-R1. If you find our model helpful, we'd appreciate a ⭐!