MolmoPoint: Better Pointing for VLMs with Grounding Tokens Paper • 2603.28069 • Published 2 days ago • 2
HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention Paper • 2603.28458 • Published 1 day ago • 2
DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing Paper • 2603.28713 • Published 1 day ago • 9
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published 1 day ago • 43
Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design Paper • 2603.28376 • Published 1 day ago • 10
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 6 days ago • 120
Vega: Learning to Drive with Natural Language Instructions Paper • 2603.25741 • Published 5 days ago • 4
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published 5 days ago • 11
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience Paper • 2603.24533 • Published 6 days ago • 41
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 6 days ago • 90
Toward Physically Consistent Driving Video World Models under Challenging Trajectories Paper • 2603.24506 • Published 6 days ago • 4
OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning Paper • 2603.24458 • Published 6 days ago • 5
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents Paper • 2603.24329 • Published 6 days ago • 21
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published 7 days ago • 31
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 8 days ago • 131
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs Paper • 2603.22446 • Published 8 days ago • 6
ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment Paper • 2603.23376 • Published 7 days ago • 3
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published 7 days ago • 31
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published 7 days ago • 88