Spaces:

MCP-1st-Birthday
/

AI-Digital-Library-Assistant

Running

App Files Files Community

AI-Digital-Library-Assistant / README.md

Nihal2000

Update README.md

7573623 verified 11 days ago

preview code

raw

history blame contribute delete

10.1 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: AI Digital Library Assistant
emoji: 📚
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
tags:
  - mcp-in-action-track-consumer
  - mcp-in-action-track-creative
  - building-mcp-track-consumer
  - building-mcp-track-creative
  - MCP-1st-Birthday

Demo Link : https://youtu.be/09Lls0zJ-QE

Social media post Link : https://x.com/nihald2000/status/1995198714156286290?s=20

The AI Digital Library Assistant is a next-generation knowledge management tool built for the MCP 1st Birthday Hackathon. It transforms your static document collection into an interactive, living library.

Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the Model Context Protocol (MCP) to create a modular ecosystem of tools—Ingestion, Search, and Podcast Generation—that work harmoniously to help you consume information in the way that suits you best.

graph TD
    User((👤 User))
    
    subgraph "Frontend (Gradio)"
        UI[Web Interface]
        PodcastUI[Podcast Studio]
    end
    
    subgraph "MCP Server Layer"
        MCPServer[Content Organizer MCP Server]
        
        subgraph "MCP Tools"
            IngestTool[📥 Ingestion Tool]
            SearchTool[🔍 Search Tool]
            GenTool[✨ Generative Tool]
            PodTool[🎧 Podcast Tool]
        end
    end
    
    subgraph "Service Layer"
        VecStore[(Vector Store)]
        DocStore[(Document Store)]
        LLM[LLM Service (OpenAI / Nebius AI)]
        ElevenLabs[ElevenLabs API]
        LlamaIndex[LlamaIndex Agent]
    end

    User <--> UI
    UI <--> MCPServer
    
    MCPServer --> IngestTool
    MCPServer --> SearchTool
    MCPServer --> GenTool
    MCPServer --> PodTool
    
    IngestTool --> VecStore
    IngestTool --> DocStore
    SearchTool --> VecStore
    GenTool --> LLM
    PodTool --> LlamaIndex
    PodTool --> ElevenLabs
    PodTool --> LLM

🚀 Quick Start

Check out QUICKSTART.md for detailed local setup instructions.

Clone & Install:

git clone https://huggingface.co/spaces/Nihal2000/AiDigitalLibraryAssistant
pip install -r requirements.txt

Configure: Add your OPENAI_API_KEY and ELEVENLABS_API_KEY to .env.
Run: python app.py

💡 How It Works

1. The MCP Core

At the heart of the application is the AiDigitalLibraryAssistant. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client!

```bash
{
  "mcpServers": {
    "ai-library": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://huggingface.co/proxy/mcp-1st-birthday-ai-digital-library-assistant.hf.space/gradio_api/mcp/sse"
      ]
    }
  }
}
```

2. 🎧 Podcast Studio (Star Feature)

Turn your reading list into a playlist! The Podcast Studio is a flagship feature that transforms any selection of documents into an engaging, multi-speaker audio podcast.

Intelligent Scripting: Uses LlamaIndex and OpenAI/Nebius AI to analyze your documents and generate a natural, conversational script.
Multi-Speaker Synthesis: Leverages ElevenLabs to bring the script to life with distinct, realistic voices for each host.
Customizable: Choose your style (Educational, Casual, Teaching) and duration.

✨ Features

📚 Document Management

Multi-format Support: PDF, DOCX, TXT, and image files (PNG, JPG, JPEG) Intelligent OCR: Automatic text extraction from images and scanned documents Semantic Chunking: Documents automatically split into meaningful segments for better retrieval Metadata Tracking: Comprehensive document metadata including file size, type, creation date, and custom tags Vector Embeddings: All documents indexed with dense vector embeddings for semantic search

🔍 Advanced Search

Semantic Search: Find documents by meaning, not just keywords Configurable Results: Adjust the number of results (1-20) based on your needs Relevance Scoring: Each result includes a confidence score Source Attribution: Direct links to source documents with highlighted excerpts

🎨 Content Studio

Transform your documents with 8 powerful AI tools:

Summarize: Generate concise, detailed, bullet-point, or executive summaries Generate Outline: Create structured outlines from topics or documents (3-10 sections) Explain Concept: Get explanations tailored to different audiences (general, technical, beginner, expert) Paraphrase: Rewrite text in various styles (formal, casual, academic, simple, technical) Categorize: Automatically classify content into user-defined categories Key Insights: Extract the most important points from any document Generate Questions: Create comprehension, analysis, application, creative, or factual questions Extract Key Info: Pull out structured information (entities, dates, facts) in JSON format

🏷️ Smart Tagging

AI-Generated Tags: Automatically generate 3-15 relevant tags for any document Persistent Storage: Tags saved directly to document metadata Batch Processing: Tag multiple documents or custom text snippets

❓ RAG-Powered Q&A

Context-Aware Answers: Ask questions and get answers grounded in your documents Source Citations: Every answer includes relevant source excerpts Confidence Scoring: Transparency about answer reliability Multi-Document Synthesis: Answers can draw from multiple documents simultaneously

🎙️ Podcast Studio

Convert documents into engaging audio conversations:

AI Voice Generation: Ultra-realistic voices powered by ElevenLabs Two-Host Format: Dynamic dialogue between two AI personalities Multiple Styles: Conversational, educational, technical, or casual Custom Duration: 5-30 minute podcasts Voice Selection: Choose from 7+ professional AI voices Full Transcripts: Complete text transcripts for every generated podcast Podcast Library: Browse, play, and manage all generated podcasts

📊 Dashboard & Analytics

Real-time Stats: Track total documents, vector chunks, and storage usage Recent Activity: View recently added documents at a glance System Health: Monitor vector store, LLM service, and voice service status

Data Flow

Document Ingestion:

- Files → OCR → Text Extraction → Chunking → Embedding Generation → Vector Store

Semantic Search:

- Query → Embedding → Vector Search → Relevance Ranking → Results

RAG Q&A:

- Question → Search → Context Retrieval → LLM Generation → Answer + Sources

Podcast Generation:

- Documents → Content Analysis → Script Generation → Voice Synthesis → Audio File

Basic Workflow

Upload Documents Navigate to the "📄 Upload Documents" tab:

Click "Select a document" or drag-and-drop files Supported formats: PDF, DOCX, TXT, PNG, JPG, JPEG Click "🚀 Process & Add to Library" Wait for processing to complete (OCR runs automatically for images) Note the Document ID from the output

Search Your Library Go to "🔍 Search Documents":

Enter a natural language query (e.g., "What are the key findings about climate change?") Adjust "Number of Results" slider (1-20) Click "🔍 Search" Review results with relevance scores and source excerpts

Ask Questions Navigate to "❓ Ask Questions":

Type your question about uploaded documents Click "❓ Get Answer" Receive AI-generated answer with source citations Check confidence level and review source documents

Generate Content Open "📝 Content Studio":

Select a document from dropdown OR paste custom text Choose a task from the dropdown:

Summarize, Outline, Explain, Paraphrase, etc.

Configure task-specific options in "⚙️ Advanced Options" Click "🚀 Run Task" Copy or download the generated content

Create Podcasts Visit "🎧 Podcast Studio":

Select 1-5 documents using checkboxes Choose Style (conversational, educational, technical, casual) Set Duration (5-30 minutes) Select voices for Host 1 and Host 2 Click "🎙️ Generate Podcast" Listen to the generated audio and read the transcript Browse past podcasts in the Podcast Library

Generate Tags Go to "🏷️ Generate Tags":

Select a document OR paste custom text Adjust "Number of Tags" slider (3-15) Click "🏷️ Generate Tags"

🏆 Hackathon Tracks

We are submitting to:

Building MCP: For our custom AiDigitalLibraryAssistant MCP server implementation.
MCP in Action (Consumer/Creative): For the innovative Podcast interface that makes personal knowledge management accessible and fun.

📜 License

MIT License. Built with ❤️ for the AI community.

🙏 Acknowledgements & Sponsors

This project was built for the MCP 1st Birthday Hackathon and proudly leverages technology from:

OpenAI: Providing the foundational intelligence for our document analysis and content generation.
Nebius AI: Powering our high-performance inference needs.
LlamaIndex: The backbone of our data orchestration, enabling sophisticated RAG and agentic workflows for the Podcast Studio.
ElevenLabs: Bringing our podcasts to life with industry-leading, hyper-realistic text-to-speech.
Hugging Face: Hosting our application on Spaces and providing the Gradio framework for our beautiful, responsive UI.
Anthropic: For pioneering the Model Context Protocol (MCP) that makes this modular architecture possible.

🔌 Connect to Claude

Want to use these tools directly inside Claude Desktop? Check out our Client Setup Guide to connect this MCP server to your local Claude instance!