--- title: AI Assistant For Visually Impaired emoji: 💬 colorFrom: yellow colorTo: purple sdk: gradio sdk_version: 5.42.0 app_file: app.py pinned: false hf_oauth: true hf_oauth_scopes: - inference-api license: mit short_description: AI-Assistant-for-Visually-Impaired tags: - mcp-in-action-track-consumer - building-mcp-track-consumer --- # Accessibility Voice Agent — MCP Tools ### **Track:** mcp-in-action-track-consumer ### **Team:** Team ### **Author:** @subhash4face — *Subhash Mankunnu* ### **Author:** *Athira AR* Model Context Protocol (MCP) + Gradio 6 + HF Inference + ElevenLabs A fully accessible, voice-driven AI assistant demonstrating how MCP tools can enable **speech-to-text**, **image understanding**, and **text-to-speech** workflows for low-vision and visually impaired users. This project showcases a real-world use case of MCP tools working together inside an agent-style UI. --- ## 🔄 Workflow Diagram — MCP Tools ![Workflow Diagram](./WorkFlow.png) ## 🚀 Demo Video 👉 *https://youtu.be/af4Y89g2HPE* ## 🚀 Social Media Post - LinkedIn 👉 *https://www.linkedin.com/posts/subhashmankunnu_hugginface-share-7400924735989010432-a9sH?utm_source=share&utm_medium=member_desktop&rcm=ACoAAASVxnsB9ojyfy-Kef3IWvBPf4c3pUSOaWw* --- ## 🌟 Key Features ### 🔊 Text-to-Speech (TTS) via ElevenLabs **MCP Tool:** `speak_text` - Converts any assistant message to natural speech - Returns base64 audio + WAV playback - Helps low-vision users receive spoken responses --- ### 🎤 Speech-to-Text (STT) via Whisper / Local fallback **MCP Tool:** `transcribe_audio` - OpenAI Whisper STT or local fallback - Great for hands-free usage - Tool-call log shows backend + duration --- ### 🖼 Image Description via OpenAI / Gemini / HF Inference **MCP Tool:** `describe_image` - Multimodal accessibility - Describes any uploaded image in plain language - Hugging Face Inference API used instead of local BLIP --- ### 🧩 Fully MCP-powered Every capability is wrapped as an MCP tool, making this app a template for: - Agents - Assistive technologies - Multimodal accessibility apps - Voice-driven workflows - Cross-backend tool orchestration --- ## 💡 Real Use Case: Accessibility Designed for: - Low-vision users - Voice-interface users - Anyone needing automated image descriptions - Hands-free workflows - Assistive technology research --- ## 🛠 Tech Stack - MCP Server (Python) - Gradio 6 - OpenAI Whisper (STT) - ElevenLabs (TTS) - Gemini Vision (optional) - Hugging Face Inference API (image captioning) - Python --- ## 🏁 How to Run Locally ```bash pip install -r requirements.txt python app.py