Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.07966

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 16 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive.

Efficient-Large-Model/LongVILA-R1-7B

Updated Jul 31 • 1.76k • 12
Efficient-Large-Model/qwen2-7b-longvila-256f

Updated Nov 28, 2024 • 265
Efficient-Large-Model/qwen2-1.5b-longvila-256f

Updated Nov 28, 2024 • 8
Efficient-Large-Model/qwen2-7b-longvila-1M

Updated Jan 14 • 12 • 2

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 315
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18 • 23
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27 • 14

vrgamedevgirl84/Wan14BT2VFusioniX

Text-to-Video • Updated Jun 21 • 592
TheStageAI/Elastic-mochi-1-preview

Text-to-Video • Updated Oct 12 • 27 • 2
nesaorg/animatediff-base

Text-to-Video • Updated Jun 22 • 107
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Paper • 2506.18839 • Published Jun 18 • 12

reinforcement-learning

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 30
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23 • 31
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7 • 74

Video Generation

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Paper • 2510.02283 • Published Oct 2 • 95
Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6 • 117
LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 184
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10 • 128

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 89
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 242

Images are Worth Variable Length of Representations

Paper • 2506.03643 • Published Jun 4 • 4
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21 • 12
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22 • 12

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 16 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Video Generation

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Paper • 2510.02283 • Published Oct 2 • 95
Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6 • 117
LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 184
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10 • 128

Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive.

Efficient-Large-Model/LongVILA-R1-7B

Updated Jul 31 • 1.76k • 12
Efficient-Large-Model/qwen2-7b-longvila-256f

Updated Nov 28, 2024 • 265
Efficient-Large-Model/qwen2-1.5b-longvila-256f

Updated Nov 28, 2024 • 8
Efficient-Large-Model/qwen2-7b-longvila-1M

Updated Jan 14 • 12 • 2

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 315
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18 • 23
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27 • 14

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 89
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 242

vrgamedevgirl84/Wan14BT2VFusioniX

Text-to-Video • Updated Jun 21 • 592
TheStageAI/Elastic-mochi-1-preview

Text-to-Video • Updated Oct 12 • 27 • 2
nesaorg/animatediff-base

Text-to-Video • Updated Jun 22 • 107
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Paper • 2506.18839 • Published Jun 18 • 12

Images are Worth Variable Length of Representations

Paper • 2506.03643 • Published Jun 4 • 4
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159

reinforcement-learning

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 30
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23 • 31
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7 • 74

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21 • 12
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22 • 12

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs