STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation Paper • 2605.08029 • Published 14 days ago • 11
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation Paper • 2605.06376 • Published 15 days ago • 26
SenseNova-U1 Collection SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 8 items • Updated 6 days ago • 66
GenLIP Collection Model weights of paper "Let ViT Speak: Generative Language-Image Pre-training" • 6 items • Updated 16 days ago • 5
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 25 days ago • 118
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published Mar 25 • 28
MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 7 items • Updated 19 days ago • 55
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published Apr 15 • 121
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published Apr 15 • 162