13 11 5

Yu Zhang

AaronZ345

https://aaronz345.github.io

AI & ML interests

Multi-Modal Generative AI (Spatial Audio/Music/Singing/Speech).

Recent Activity

authored a paper 15 days ago

ALIVE: Animate Your World with Lifelike Audio-Video Generation

authored a paper 15 days ago

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

authored a paper 15 days ago

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

View all activity

Organizations

authored 5 papers 15 days ago

ALIVE: Animate Your World with Lifelike Audio-Video Generation

Paper • 2602.08682 • Published Feb 10 • 2

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

Paper • 2605.30993 • Published 19 days ago • 58

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

Paper • 2605.30940 • Published 19 days ago • 37

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Paper • 2605.28618 • Published 21 days ago • 32

Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches

Paper • 2601.13910 • Published Jan 20

upvoted 2 papers 16 days ago

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

Paper • 2605.30940 • Published 19 days ago • 37

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Paper • 2605.28618 • Published 21 days ago • 32

submitted 2 papers to Daily Papers 16 days ago

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Paper • 2605.28618 • Published 21 days ago • 32

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

Paper • 2605.30940 • Published 19 days ago • 37

upvoted a paper 16 days ago

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

Paper • 2605.30993 • Published 19 days ago • 58

submitted a paper to Daily Papers 16 days ago

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

Paper • 2605.30993 • Published 19 days ago • 58

New activity in GTSinger/GTSinger about 1 month ago

Annotation quality is very low, not usable for training

#8 opened 2 months ago by

da1sypetals-iota

authored a paper 8 months ago

MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations

Paper • 2510.10396 • Published Oct 12, 2025

authored a paper 10 months ago

ASAudio: A Survey of Advanced Spatial Audio Research

Paper • 2508.10924 • Published Aug 8, 2025 • 2

upvoted a paper 10 months ago

ASAudio: A Survey of Advanced Spatial Audio Research

Paper • 2508.10924 • Published Aug 8, 2025 • 2

updated a dataset 10 months ago

AaronZ345/MRSDrama

Preview • Updated Aug 10, 2025 • 3.78k • 2

updated a dataset 11 months ago

AaronZ345/GTSinger

Viewer • Updated Jul 24, 2025 • 28.6k • 2.42k • 15

authored a paper 11 months ago

Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion

Paper • 2507.14534 • Published Jul 19, 2025

liked a dataset 11 months ago

AaronZ345/MRSDrama

Preview • Updated Aug 10, 2025 • 3.78k • 2

authored a paper 11 months ago

Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly

Paper • 2505.00426 • Published May 1, 2025

Yu Zhang

AI & ML interests

Recent Activity

Organizations

AaronZ345's activity

Annotation quality is very low, not usable for training