ALIVE: Animate Your World with Lifelike Audio-Video Generation Paper • 2602.08682 • Published Feb 10 • 2
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published 19 days ago • 58
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer Paper • 2605.30940 • Published 19 days ago • 37
Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios Paper • 2605.28618 • Published 21 days ago • 32
Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches Paper • 2601.13910 • Published Jan 20
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer Paper • 2605.30940 • Published 19 days ago • 37
Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios Paper • 2605.28618 • Published 21 days ago • 32
Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios Paper • 2605.28618 • Published 21 days ago • 32
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer Paper • 2605.30940 • Published 19 days ago • 37
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published 19 days ago • 58
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published 19 days ago • 58
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations Paper • 2510.10396 • Published Oct 12, 2025
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion Paper • 2507.14534 • Published Jul 19, 2025
Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly Paper • 2505.00426 • Published May 1, 2025