File size: 4,885 Bytes
2a8d9e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
title: CXR-Findings-AI
emoji: 🫁
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
license: mit
pinned: true
tags:
- gradio
- pytorch
- computer-vision
- nlp
- multimodal
- vision-language
- image-to-text
- chest-xray
- radiology
- medical-ai
- attention
- attention-visualization
- interpretability
- explainable-ai
- xai
- cpu-inference
- healthcare
- demo
short_description: Generate chest X-ray findings and explore attention.
---

# 🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer

### **Live Demo (CPU-Only, thanks to Hugging Face Spaces)**

🔗 **[https://huggingface.co/spaces/manu02/CXR-Findings-AI](https://huggingface.co/spaces/manu02/CXR-Findings-AI)**


![App working](assets/app_view.png)

---

# 🧠 Overview

**CXR-Findings-AI** is an interactive Gradio application that:

### ✅ **Generates radiology findings from a chest X-ray image**

### ✅ **Visualizes multimodal attention (image ↔ text) across layers and heads**

### ✅ **Runs entirely on CPU**, showcasing the efficiency of the underlying 246M-parameter model

The system lets researchers, clinicians, and students explore **how different image regions influence each generated word**, enabling deeper interpretability in medical AI.

---

# 🔍 What This App Provides

### 🫁 **1. Findings Generation**

A lightweight multimodal model produces chest X-ray findings directly from the uploaded image.

### 👁️ **2. Layer-wise & Head-wise Attention Visualization**

Inspect how the model distributes attention:

* Across **transformer layers**
* Across **attention heads**
* Between **image tokens** (32×32 grid → 1024 tokens)
* And **generated text tokens**

### 🎨 **3. Three Synchronized Views**

For each selected word:

1. **Original Image**
2. **Overlay View:** Image + blended attention map
3. **Pure Heatmap:** Visualizes raw attention intensities

### 🧩 **4. Word-Level Interpretability**

Click any word in the generated report to reveal its cross-modal attention patterns.

---

# 🚀 Quickstart (Local Usage)

### 1) Clone

```bash
git clone https://github.com/devMuniz02/Image-Attention-Visualizer
cd Image-Attention-Visualizer
```

### 2) (Optional) Create a virtual environment

**Windows:**

```powershell
python -m venv venv
.\venv\Scripts\Activate.ps1
```

**macOS / Linux:**

```bash
python3 -m venv venv
source venv/bin/activate
```

### 3) Install requirements

```bash
pip install -r requirements.txt
```

### 4) Run the app

```bash
python app.py
```

Then open:

```
http://127.0.0.1:7860
```

---

# 🧭 How to Use the Interface

1. **Upload a chest X-ray** (or load a sample)
2. Adjust:

   * Max new tokens
   * Layer selection
   * Head selection
   * Or choose *mean* attention across all layers/heads
3. Click **Generate Findings**
4. Click any generated word to visualize:

   * Image ↔ text attention
   * Heatmaps
   * Cross-token relationships

---

# 🧩 Repository Structure

| File                             | Description                                   |
| -------------------------------- | --------------------------------------------- |
| `app.py`                         | Main Gradio interface and visualization logic |
| `utils/models/complete_model.py` | Full multimodal model assembly                |
| `utils/processing.py`            | Image preprocessing                           |
| `assets/`                        | UI images & examples                          |
| `requirements.txt`               | Dependencies                                  |
| `README.md`                      | This file                                     |

---

# 🛠️ Troubleshooting

* **Blank heatmap** → Ensure `output_attentions=True` in `.generate()`
* **Distorted attention** → Check token count = 1024 (32×32)
* **Tokenizer errors** → Confirm `model.decoder.tokenizer` is loaded
* **OOM on local machine** → Reduce `max_new_tokens` or use CPU-only settings
* **Slow inference** → CPU mode is intentionally lightweight; GPU recommended for higher throughput

---

# 🧪 Model Integration Notes

Compatible with any encoder–decoder or vision–language model that:

* Accepts `pixel_values`
* Returns attentions when calling

  ```python
  model.generate(..., output_attentions=True)
  ```
* Provides a decoder tokenizer:

  ```python
  model.decoder.tokenizer
  ```

Ideal for research in:

* Medical AI
* Vision–language alignment
* Cross-modal interpretability
* Attention visualization
* Explainable AI (XAI)

---

# ❤️ Acknowledgments

* Powered by **Gradio** and **Hugging Face Transformers**
* Based on and expanded from the **Token-Attention-Viewer** project
🔗 [https://github.com/devMuniz02/Image-Attention-Visualizer](https://github.com/devMuniz02/Image-Attention-Visualizer)
* Created as part of a thesis on **efficient and explainable multimodal medical AI**