-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2507.07966
-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 159 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 315 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 23 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14
-
vrgamedevgirl84/Wan14BT2VFusioniX
Text-to-Video • Updated • 592 -
TheStageAI/Elastic-mochi-1-preview
Text-to-Video • Updated • 27 • 2 -
nesaorg/animatediff-base
Text-to-Video • Updated • 107 -
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
Paper • 2506.18839 • Published • 12
-
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 30 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 31 -
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Paper • 2507.05255 • Published • 74
-
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Paper • 2510.02283 • Published • 95 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 117 -
LongLive: Real-time Interactive Long Video Generation
Paper • 2509.22622 • Published • 184 -
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Paper • 2509.08519 • Published • 128
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper • 2504.16030 • Published • 36 -
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper • 2505.24867 • Published • 80 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 242
-
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Paper • 2505.15966 • Published • 53 -
GRIT: Teaching MLLMs to Think with Images
Paper • 2505.15879 • Published • 12 -
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Paper • 2505.16854 • Published • 11 -
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Paper • 2505.16192 • Published • 12
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Paper • 2510.02283 • Published • 95 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 117 -
LongLive: Real-time Interactive Long Video Generation
Paper • 2509.22622 • Published • 184 -
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Paper • 2509.08519 • Published • 128
-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 159 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 315 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 23 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper • 2504.16030 • Published • 36 -
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper • 2505.24867 • Published • 80 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 242
-
vrgamedevgirl84/Wan14BT2VFusioniX
Text-to-Video • Updated • 592 -
TheStageAI/Elastic-mochi-1-preview
Text-to-Video • Updated • 27 • 2 -
nesaorg/animatediff-base
Text-to-Video • Updated • 107 -
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
Paper • 2506.18839 • Published • 12
-
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 30 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 31 -
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Paper • 2507.05255 • Published • 74
-
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Paper • 2505.15966 • Published • 53 -
GRIT: Teaching MLLMs to Think with Images
Paper • 2505.15879 • Published • 12 -
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Paper • 2505.16854 • Published • 11 -
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Paper • 2505.16192 • Published • 12