CAPE: A CLIP-Aware Pointing Ensemble of Complementary Heatmap Cues for Embodied Reference Understanding Paper • 2507.21888 • Published Jul 29, 2025
Audio-driven Talking Face Generation with Stabilized Synchronization Loss Paper • 2307.09368 • Published Jul 18, 2023
Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation Paper • 2405.04327 • Published May 7, 2024