A newer version of the Gradio SDK is available:
6.1.0
title: AI Digital Library Assistant
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
tags:
- mcp-in-action-track-consumer
- mcp-in-action-track-creative
- building-mcp-track-consumer
- building-mcp-track-creative
- MCP-1st-Birthday
Demo Link : https://youtu.be/09Lls0zJ-QE
Social media post Link : https://x.com/nihald2000/status/1995198714156286290?s=20
The AI Digital Library Assistant is a next-generation knowledge management tool built for the MCP 1st Birthday Hackathon. It transforms your static document collection into an interactive, living library.
Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the Model Context Protocol (MCP) to create a modular ecosystem of toolsβIngestion, Search, and Podcast Generationβthat work harmoniously to help you consume information in the way that suits you best.
graph TD
User((π€ User))
subgraph "Frontend (Gradio)"
UI[Web Interface]
PodcastUI[Podcast Studio]
end
subgraph "MCP Server Layer"
MCPServer[Content Organizer MCP Server]
subgraph "MCP Tools"
IngestTool[π₯ Ingestion Tool]
SearchTool[π Search Tool]
GenTool[β¨ Generative Tool]
PodTool[π§ Podcast Tool]
end
end
subgraph "Service Layer"
VecStore[(Vector Store)]
DocStore[(Document Store)]
LLM[LLM Service (OpenAI / Nebius AI)]
ElevenLabs[ElevenLabs API]
LlamaIndex[LlamaIndex Agent]
end
User <--> UI
UI <--> MCPServer
MCPServer --> IngestTool
MCPServer --> SearchTool
MCPServer --> GenTool
MCPServer --> PodTool
IngestTool --> VecStore
IngestTool --> DocStore
SearchTool --> VecStore
GenTool --> LLM
PodTool --> LlamaIndex
PodTool --> ElevenLabs
PodTool --> LLM
π Quick Start
Check out QUICKSTART.md for detailed local setup instructions.
- Clone & Install:
git clone https://huggingface.co/spaces/Nihal2000/AiDigitalLibraryAssistant pip install -r requirements.txt - Configure: Add your
OPENAI_API_KEYandELEVENLABS_API_KEYto.env. - Run:
python app.py
π‘ How It Works
1. The MCP Core
At the heart of the application is the AiDigitalLibraryAssistant. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client!
```bash
{
"mcpServers": {
"ai-library": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://huggingface.co/proxy/mcp-1st-birthday-ai-digital-library-assistant.hf.space/gradio_api/mcp/sse"
]
}
}
}
```
2. π§ Podcast Studio (Star Feature)
Turn your reading list into a playlist! The Podcast Studio is a flagship feature that transforms any selection of documents into an engaging, multi-speaker audio podcast.
- Intelligent Scripting: Uses LlamaIndex and OpenAI/Nebius AI to analyze your documents and generate a natural, conversational script.
- Multi-Speaker Synthesis: Leverages ElevenLabs to bring the script to life with distinct, realistic voices for each host.
- Customizable: Choose your style (Educational, Casual, Teaching) and duration.
β¨ Features
π Document Management
Multi-format Support: PDF, DOCX, TXT, and image files (PNG, JPG, JPEG) Intelligent OCR: Automatic text extraction from images and scanned documents Semantic Chunking: Documents automatically split into meaningful segments for better retrieval Metadata Tracking: Comprehensive document metadata including file size, type, creation date, and custom tags Vector Embeddings: All documents indexed with dense vector embeddings for semantic search
π Advanced Search
Semantic Search: Find documents by meaning, not just keywords Configurable Results: Adjust the number of results (1-20) based on your needs Relevance Scoring: Each result includes a confidence score Source Attribution: Direct links to source documents with highlighted excerpts
π¨ Content Studio
Transform your documents with 8 powerful AI tools:
Summarize: Generate concise, detailed, bullet-point, or executive summaries Generate Outline: Create structured outlines from topics or documents (3-10 sections) Explain Concept: Get explanations tailored to different audiences (general, technical, beginner, expert) Paraphrase: Rewrite text in various styles (formal, casual, academic, simple, technical) Categorize: Automatically classify content into user-defined categories Key Insights: Extract the most important points from any document Generate Questions: Create comprehension, analysis, application, creative, or factual questions Extract Key Info: Pull out structured information (entities, dates, facts) in JSON format
π·οΈ Smart Tagging
AI-Generated Tags: Automatically generate 3-15 relevant tags for any document Persistent Storage: Tags saved directly to document metadata Batch Processing: Tag multiple documents or custom text snippets
β RAG-Powered Q&A
Context-Aware Answers: Ask questions and get answers grounded in your documents Source Citations: Every answer includes relevant source excerpts Confidence Scoring: Transparency about answer reliability Multi-Document Synthesis: Answers can draw from multiple documents simultaneously
ποΈ Podcast Studio
Convert documents into engaging audio conversations:
AI Voice Generation: Ultra-realistic voices powered by ElevenLabs Two-Host Format: Dynamic dialogue between two AI personalities Multiple Styles: Conversational, educational, technical, or casual Custom Duration: 5-30 minute podcasts Voice Selection: Choose from 7+ professional AI voices Full Transcripts: Complete text transcripts for every generated podcast Podcast Library: Browse, play, and manage all generated podcasts
π Dashboard & Analytics
Real-time Stats: Track total documents, vector chunks, and storage usage Recent Activity: View recently added documents at a glance System Health: Monitor vector store, LLM service, and voice service status
Data Flow
Document Ingestion:
- Files β OCR β Text Extraction β Chunking β Embedding Generation β Vector Store
Semantic Search:
- Query β Embedding β Vector Search β Relevance Ranking β Results
RAG Q&A:
- Question β Search β Context Retrieval β LLM Generation β Answer + Sources
Podcast Generation:
- Documents β Content Analysis β Script Generation β Voice Synthesis β Audio File
Basic Workflow
- Upload Documents Navigate to the "π Upload Documents" tab:
Click "Select a document" or drag-and-drop files Supported formats: PDF, DOCX, TXT, PNG, JPG, JPEG Click "π Process & Add to Library" Wait for processing to complete (OCR runs automatically for images) Note the Document ID from the output
- Search Your Library Go to "π Search Documents":
Enter a natural language query (e.g., "What are the key findings about climate change?") Adjust "Number of Results" slider (1-20) Click "π Search" Review results with relevance scores and source excerpts
- Ask Questions Navigate to "β Ask Questions":
Type your question about uploaded documents Click "β Get Answer" Receive AI-generated answer with source citations Check confidence level and review source documents
- Generate Content Open "π Content Studio":
Select a document from dropdown OR paste custom text Choose a task from the dropdown:
Summarize, Outline, Explain, Paraphrase, etc.
Configure task-specific options in "βοΈ Advanced Options" Click "π Run Task" Copy or download the generated content
- Create Podcasts Visit "π§ Podcast Studio":
Select 1-5 documents using checkboxes Choose Style (conversational, educational, technical, casual) Set Duration (5-30 minutes) Select voices for Host 1 and Host 2 Click "ποΈ Generate Podcast" Listen to the generated audio and read the transcript Browse past podcasts in the Podcast Library
- Generate Tags Go to "π·οΈ Generate Tags":
Select a document OR paste custom text Adjust "Number of Tags" slider (3-15) Click "π·οΈ Generate Tags"
π Hackathon Tracks
We are submitting to:
- Building MCP: For our custom
AiDigitalLibraryAssistantMCP server implementation. - MCP in Action (Consumer/Creative): For the innovative Podcast interface that makes personal knowledge management accessible and fun.
π License
MIT License. Built with β€οΈ for the AI community.
π Acknowledgements & Sponsors
This project was built for the MCP 1st Birthday Hackathon and proudly leverages technology from:
- OpenAI: Providing the foundational intelligence for our document analysis and content generation.
- Nebius AI: Powering our high-performance inference needs.
- LlamaIndex: The backbone of our data orchestration, enabling sophisticated RAG and agentic workflows for the Podcast Studio.
- ElevenLabs: Bringing our podcasts to life with industry-leading, hyper-realistic text-to-speech.
- Hugging Face: Hosting our application on Spaces and providing the Gradio framework for our beautiful, responsive UI.
- Anthropic: For pioneering the Model Context Protocol (MCP) that makes this modular architecture possible.
π Connect to Claude
Want to use these tools directly inside Claude Desktop? Check out our Client Setup Guide to connect this MCP server to your local Claude instance!
