Article
AbstractPhila PRO
AbstractPhil
AI & ML interests
datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.
Recent Activity
updated
a model
about 23 hours ago
AbstractPhil/agatha-diffusion-proto
published
a model
about 24 hours ago
AbstractPhil/agatha-diffusion-proto
replied to
prithivMLmods's
post
4 days ago
One speech model with seven voices, streamlined with multimodal capabilities for vision tasks. Performs vision(image-text) to audio inference with Qwen2.5-VL + VibeVoice-Realtime-0.5B. Vision to VibeVoice (EN) - The demo is live. ๐ฃ๏ธ๐ฅ
๐ค Vision-to-VibeVoice-en [Demo]: https://huggingface.co/spaces/prithivMLmods/Vision-to-VibeVoice-en
โจ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
โจ Speech [VibeVoice-Realtime-0.5B]: https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B
โจ Vision [Qwen2.5-VL]: https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
To know more about it, visit the app page or the respective model page!