Running on Zero Featured 1.74k Dia 1.6B 👯 1.74k Generate realistic dialogue from a script, using Dia!
Running on Zero Featured 229 Spark TTS 🌖 229 A text-to-speech model powered by SparkAudio and Mobvoi.
HuggingFaceTB/SmolVLM2-500M-Video-Instruct Image-Text-to-Text • 0.5B • Updated Apr 8, 2025 • 90.1k • 112
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated 25 days ago • 216k • 1.56k
Running Featured 350 Kokoro Text-to-Speech (WebGPU) 🗣 350 High-quality speech synthesis powered by Kokoro TTS
mlx-community/SmolVLM2-500M-Video-Instruct-mlx Video-Text-to-Text • Updated Feb 20, 2025 • 1.41k • 18
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations Paper • 2411.10818 • Published Nov 16, 2024 • 26
Kosmos-2: Grounding Multimodal Large Language Models to the World Paper • 2306.14824 • Published Jun 26, 2023 • 34