Spaces:
Sleeping
Sleeping
File size: 4,885 Bytes
2a8d9e3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
---
title: CXR-Findings-AI
emoji: 🫁
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
license: mit
pinned: true
tags:
- gradio
- pytorch
- computer-vision
- nlp
- multimodal
- vision-language
- image-to-text
- chest-xray
- radiology
- medical-ai
- attention
- attention-visualization
- interpretability
- explainable-ai
- xai
- cpu-inference
- healthcare
- demo
short_description: Generate chest X-ray findings and explore attention.
---
# 🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer
### **Live Demo (CPU-Only, thanks to Hugging Face Spaces)**
🔗 **[https://huggingface.co/spaces/manu02/CXR-Findings-AI](https://huggingface.co/spaces/manu02/CXR-Findings-AI)**

---
# 🧠 Overview
**CXR-Findings-AI** is an interactive Gradio application that:
### ✅ **Generates radiology findings from a chest X-ray image**
### ✅ **Visualizes multimodal attention (image ↔ text) across layers and heads**
### ✅ **Runs entirely on CPU**, showcasing the efficiency of the underlying 246M-parameter model
The system lets researchers, clinicians, and students explore **how different image regions influence each generated word**, enabling deeper interpretability in medical AI.
---
# 🔍 What This App Provides
### 🫁 **1. Findings Generation**
A lightweight multimodal model produces chest X-ray findings directly from the uploaded image.
### 👁️ **2. Layer-wise & Head-wise Attention Visualization**
Inspect how the model distributes attention:
* Across **transformer layers**
* Across **attention heads**
* Between **image tokens** (32×32 grid → 1024 tokens)
* And **generated text tokens**
### 🎨 **3. Three Synchronized Views**
For each selected word:
1. **Original Image**
2. **Overlay View:** Image + blended attention map
3. **Pure Heatmap:** Visualizes raw attention intensities
### 🧩 **4. Word-Level Interpretability**
Click any word in the generated report to reveal its cross-modal attention patterns.
---
# 🚀 Quickstart (Local Usage)
### 1) Clone
```bash
git clone https://github.com/devMuniz02/Image-Attention-Visualizer
cd Image-Attention-Visualizer
```
### 2) (Optional) Create a virtual environment
**Windows:**
```powershell
python -m venv venv
.\venv\Scripts\Activate.ps1
```
**macOS / Linux:**
```bash
python3 -m venv venv
source venv/bin/activate
```
### 3) Install requirements
```bash
pip install -r requirements.txt
```
### 4) Run the app
```bash
python app.py
```
Then open:
```
http://127.0.0.1:7860
```
---
# 🧭 How to Use the Interface
1. **Upload a chest X-ray** (or load a sample)
2. Adjust:
* Max new tokens
* Layer selection
* Head selection
* Or choose *mean* attention across all layers/heads
3. Click **Generate Findings**
4. Click any generated word to visualize:
* Image ↔ text attention
* Heatmaps
* Cross-token relationships
---
# 🧩 Repository Structure
| File | Description |
| -------------------------------- | --------------------------------------------- |
| `app.py` | Main Gradio interface and visualization logic |
| `utils/models/complete_model.py` | Full multimodal model assembly |
| `utils/processing.py` | Image preprocessing |
| `assets/` | UI images & examples |
| `requirements.txt` | Dependencies |
| `README.md` | This file |
---
# 🛠️ Troubleshooting
* **Blank heatmap** → Ensure `output_attentions=True` in `.generate()`
* **Distorted attention** → Check token count = 1024 (32×32)
* **Tokenizer errors** → Confirm `model.decoder.tokenizer` is loaded
* **OOM on local machine** → Reduce `max_new_tokens` or use CPU-only settings
* **Slow inference** → CPU mode is intentionally lightweight; GPU recommended for higher throughput
---
# 🧪 Model Integration Notes
Compatible with any encoder–decoder or vision–language model that:
* Accepts `pixel_values`
* Returns attentions when calling
```python
model.generate(..., output_attentions=True)
```
* Provides a decoder tokenizer:
```python
model.decoder.tokenizer
```
Ideal for research in:
* Medical AI
* Vision–language alignment
* Cross-modal interpretability
* Attention visualization
* Explainable AI (XAI)
---
# ❤️ Acknowledgments
* Powered by **Gradio** and **Hugging Face Transformers**
* Based on and expanded from the **Token-Attention-Viewer** project
🔗 [https://github.com/devMuniz02/Image-Attention-Visualizer](https://github.com/devMuniz02/Image-Attention-Visualizer)
* Created as part of a thesis on **efficient and explainable multimodal medical AI** |