Spaces:
Sleeping
Sleeping
| title: CXR-Findings-AI | |
| emoji: 🫁 | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: app.py | |
| license: mit | |
| pinned: true | |
| tags: | |
| - gradio | |
| - pytorch | |
| - computer-vision | |
| - nlp | |
| - multimodal | |
| - vision-language | |
| - image-to-text | |
| - chest-xray | |
| - radiology | |
| - medical-ai | |
| - attention | |
| - attention-visualization | |
| - interpretability | |
| - explainable-ai | |
| - xai | |
| - cpu-inference | |
| - healthcare | |
| - demo | |
| short_description: Generate chest X-ray findings and explore attention. | |
| # 🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer | |
| ### **Live Demo (CPU-Only, thanks to Hugging Face Spaces)** | |
| 🔗 **[https://huggingface.co/spaces/manu02/CXR-Findings-AI](https://huggingface.co/spaces/manu02/CXR-Findings-AI)** | |
|  | |
| --- | |
| # 🧠 Overview | |
| **CXR-Findings-AI** is an interactive Gradio application that: | |
| ### ✅ **Generates radiology findings from a chest X-ray image** | |
| ### ✅ **Visualizes multimodal attention (image ↔ text) across layers and heads** | |
| ### ✅ **Runs entirely on CPU**, showcasing the efficiency of the underlying 246M-parameter model | |
| The system lets researchers, clinicians, and students explore **how different image regions influence each generated word**, enabling deeper interpretability in medical AI. | |
| --- | |
| # 🔍 What This App Provides | |
| ### 🫁 **1. Findings Generation** | |
| A lightweight multimodal model produces chest X-ray findings directly from the uploaded image. | |
| ### 👁️ **2. Layer-wise & Head-wise Attention Visualization** | |
| Inspect how the model distributes attention: | |
| * Across **transformer layers** | |
| * Across **attention heads** | |
| * Between **image tokens** (32×32 grid → 1024 tokens) | |
| * And **generated text tokens** | |
| ### 🎨 **3. Three Synchronized Views** | |
| For each selected word: | |
| 1. **Original Image** | |
| 2. **Overlay View:** Image + blended attention map | |
| 3. **Pure Heatmap:** Visualizes raw attention intensities | |
| ### 🧩 **4. Word-Level Interpretability** | |
| Click any word in the generated report to reveal its cross-modal attention patterns. | |
| --- | |
| # 🚀 Quickstart (Local Usage) | |
| ### 1) Clone | |
| ```bash | |
| git clone https://github.com/devMuniz02/Image-Attention-Visualizer | |
| cd Image-Attention-Visualizer | |
| ``` | |
| ### 2) (Optional) Create a virtual environment | |
| **Windows:** | |
| ```powershell | |
| python -m venv venv | |
| .\venv\Scripts\Activate.ps1 | |
| ``` | |
| **macOS / Linux:** | |
| ```bash | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| ``` | |
| ### 3) Install requirements | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 4) Run the app | |
| ```bash | |
| python app.py | |
| ``` | |
| Then open: | |
| ``` | |
| http://127.0.0.1:7860 | |
| ``` | |
| --- | |
| # 🧭 How to Use the Interface | |
| 1. **Upload a chest X-ray** (or load a sample) | |
| 2. Adjust: | |
| * Max new tokens | |
| * Layer selection | |
| * Head selection | |
| * Or choose *mean* attention across all layers/heads | |
| 3. Click **Generate Findings** | |
| 4. Click any generated word to visualize: | |
| * Image ↔ text attention | |
| * Heatmaps | |
| * Cross-token relationships | |
| --- | |
| # 🧩 Repository Structure | |
| | File | Description | | |
| | -------------------------------- | --------------------------------------------- | | |
| | `app.py` | Main Gradio interface and visualization logic | | |
| | `utils/models/complete_model.py` | Full multimodal model assembly | | |
| | `utils/processing.py` | Image preprocessing | | |
| | `assets/` | UI images & examples | | |
| | `requirements.txt` | Dependencies | | |
| | `README.md` | This file | | |
| --- | |
| # 🛠️ Troubleshooting | |
| * **Blank heatmap** → Ensure `output_attentions=True` in `.generate()` | |
| * **Distorted attention** → Check token count = 1024 (32×32) | |
| * **Tokenizer errors** → Confirm `model.decoder.tokenizer` is loaded | |
| * **OOM on local machine** → Reduce `max_new_tokens` or use CPU-only settings | |
| * **Slow inference** → CPU mode is intentionally lightweight; GPU recommended for higher throughput | |
| --- | |
| # 🧪 Model Integration Notes | |
| Compatible with any encoder–decoder or vision–language model that: | |
| * Accepts `pixel_values` | |
| * Returns attentions when calling | |
| ```python | |
| model.generate(..., output_attentions=True) | |
| ``` | |
| * Provides a decoder tokenizer: | |
| ```python | |
| model.decoder.tokenizer | |
| ``` | |
| Ideal for research in: | |
| * Medical AI | |
| * Vision–language alignment | |
| * Cross-modal interpretability | |
| * Attention visualization | |
| * Explainable AI (XAI) | |
| --- | |
| # ❤️ Acknowledgments | |
| * Powered by **Gradio** and **Hugging Face Transformers** | |
| * Based on and expanded from the **Token-Attention-Viewer** project | |
| 🔗 [https://github.com/devMuniz02/Image-Attention-Visualizer](https://github.com/devMuniz02/Image-Attention-Visualizer) | |
| * Created as part of a thesis on **efficient and explainable multimodal medical AI** |