OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 32
NaTex: Seamless Texture Generation as Latent Color Diffusion Paper • 2511.16317 • Published Nov 20, 2025 • 15
FARMER: Flow AutoRegressive Transformer over Pixels Paper • 2510.23588 • Published Oct 27, 2025 • 58
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation Paper • 2509.15185 • Published Sep 18, 2025 • 29
Transition Models: Rethinking the Generative Learning Objective Paper • 2509.04394 • Published Sep 4, 2025 • 28
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Paper • 2506.09984 • Published Jun 11, 2025 • 14
Unleashing Vecset Diffusion Model for Fast Shape Generation Paper • 2503.16302 • Published Mar 20, 2025 • 43
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines Paper • 2410.21220 • Published Oct 28, 2024 • 11
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations Paper • 2410.08049 • Published Oct 10, 2024 • 8
Explore the Limits of Omni-modal Pretraining at Scale Paper • 2406.09412 • Published Jun 13, 2024 • 11
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5, 2024 • 19
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities Paper • 2401.14405 • Published Jan 25, 2024 • 13