manu02 commited on
Commit
2a8d9e3
·
verified ·
1 Parent(s): 9ee9ac7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +203 -160
README.md CHANGED
@@ -1,160 +1,203 @@
1
- ---
2
- title: Image-Attention-Visualizer
3
- emoji: 🔥
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: gradio
7
- app_file: app.py
8
- license: mit
9
- pinned: true
10
- tags:
11
- - gradio
12
- - pytorch
13
- - computer-vision
14
- - nlp
15
- - multimodal
16
- - vision-language
17
- - image-to-text
18
- - attention
19
- - attention-visualization
20
- - interpretability
21
- - explainability
22
- - xai
23
- - demo
24
- ---
25
-
26
- # [Github repo](https://github.com/devMuniz02/Image-Attention-Visualizer)
27
- # [TRY IT NOW ON HUGGING FACE SPACES !!](https://huggingface.co/spaces/manu02/image-attention-visualizer)
28
-
29
- ![App working](assets/app_view.png)
30
-
31
- # Image-Attention-Visualizer
32
-
33
- Image Attention Visualizer is an interactive Gradio app that visualizes **cross-modal attention** between image tokens and generated text tokens in a custom multimodal model. It allows researchers and developers to see how different parts of an image influence the model’s textual output, token by token.
34
-
35
- # Image-to-Text Attention Visualizer (Gradio)
36
-
37
- An interactive Gradio app to **generate text from an image using a custom multimodal model** and **visualize attention in real time**.
38
- It provides 3 synchronized views — original image, attention overlay, and heatmap — plus a **word-level visualization** showing how each generated word attends to visual regions.
39
-
40
- ---
41
-
42
- ## ✨ What the app does
43
-
44
- * **Generates text** from an image input using your custom model (`create_complete_model`).
45
- * Displays **three synchronized views**:
46
-
47
- 1. 🖼️ **Original image**
48
- 2. 🔥 **Overlay** (original + attention heatmap)
49
- 3. 🌈 **Heatmap alone**
50
- * **Word-level attention viewer**: select any generated word to see how its attention is distributed across the image and previously generated words.
51
- * Works directly with your **custom tokenizer (`model.decoder.tokenizer`)**.
52
- * Fixed-length **1024 image tokens (32×32 grid)** projected as a visual heatmap.
53
- * Adjustable options: **Layer**, **Head**, or **Mean Across Layers/Heads**.
54
-
55
- ---
56
-
57
- ## 🚀 Quickstart
58
-
59
- ### 1) Clone
60
-
61
- ```bash
62
- git clone https://github.com/devMuniz02/Image-Attention-Visualizer
63
- cd Image-Attention-Visualizer
64
- ```
65
-
66
- ### 2) (Optional) Create a virtual environment
67
-
68
- **Windows (PowerShell):**
69
-
70
- ```powershell
71
- python -m venv venv
72
- .\venv\Scripts\Activate.ps1
73
- ```
74
-
75
- **macOS / Linux (bash/zsh):**
76
-
77
- ```bash
78
- python3 -m venv venv
79
- source venv/bin/activate
80
- ```
81
-
82
- ### 3) Install requirements
83
-
84
- ```bash
85
- pip install -r requirements.txt
86
- ```
87
-
88
- ### 4) Run the app
89
-
90
- ```bash
91
- python app.py
92
- ```
93
-
94
- You should see something like:
95
-
96
- ```
97
- Running on local URL: http://127.0.0.1:7860
98
- ```
99
-
100
- ### 5) Open in your browser
101
-
102
- Navigate to `http://127.0.0.1:7860` to use the app.
103
-
104
- ---
105
-
106
- ## 🧭 How to use
107
-
108
- 1. **Upload an image** or load a random sample from your dataset folder.
109
- 2. **Set generation parameters**:
110
-
111
- * Max New Tokens
112
- * Layer/Head selection (or average across all)
113
- 3. Click **Generate** — the model will produce a textual description or continuation.
114
- 4. **Select a generated word** from the list:
115
-
116
- * The top row will show:
117
-
118
- * Left → **Original image**
119
- * Center → **Overlay (attention on image regions)**
120
- * Right → **Colored heatmap**
121
- * The bottom section highlights attention strength over the generated words.
122
-
123
- ---
124
-
125
- ## 🧩 Files
126
-
127
- * `app.py` — Main Gradio interface and visualization logic.
128
- * `utils/models/complete_model.py` — Model definition and generation method.
129
- * `utils/processing.py` — Image preprocessing utilities.
130
- * `requirements.txt` — Dependencies.
131
- * `README.md` This file.
132
-
133
- ---
134
-
135
- ## 🛠️ Troubleshooting
136
-
137
- * **Black or blank heatmap:** Ensure your model returns `output_attentions=True` in `.generate()`.
138
- * **Low resolution or distortion:** Adjust `img_size` or the interpolation method inside `_make_overlay`.
139
- * **Tokenizer error:** Make sure `model.decoder.tokenizer` exists and is loaded correctly.
140
- * **OOM errors:** Reduce `max_new_tokens` or use a smaller model checkpoint.
141
- * **Color or shape mismatch:** Verify that your image tokens length = 1024 (for a 32×32 layout).
142
-
143
- ---
144
-
145
- ## 🧪 Model integration notes
146
-
147
- * The app is compatible with any **encoder–decoder or vision–language model** that:
148
-
149
- * Accepts `pixel_values` as input.
150
- * Returns `generate(..., output_attentions=True)` with `(gen_ids, gen_text, attentions)`.
151
- * Uses the tokenizer from `model.decoder.tokenizer`.
152
- * Designed for research in **vision-language interpretability**, **cross-modal explainability**, and **attention visualization**.
153
-
154
- ---
155
-
156
- ## 📣 Acknowledgments
157
-
158
- * Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers).
159
- * Inspired by the original [Token-Attention-Viewer](https://github.com/devMuniz02/Token-Attention-Viewer) project.
160
- * Special thanks to the open-source community advancing **vision-language interpretability**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: CXR-Findings-AI
3
+ emoji: 🫁
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ app_file: app.py
8
+ license: mit
9
+ pinned: true
10
+ tags:
11
+ - gradio
12
+ - pytorch
13
+ - computer-vision
14
+ - nlp
15
+ - multimodal
16
+ - vision-language
17
+ - image-to-text
18
+ - chest-xray
19
+ - radiology
20
+ - medical-ai
21
+ - attention
22
+ - attention-visualization
23
+ - interpretability
24
+ - explainable-ai
25
+ - xai
26
+ - cpu-inference
27
+ - healthcare
28
+ - demo
29
+ short_description: Generate chest X-ray findings and explore attention.
30
+ ---
31
+
32
+ # 🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer
33
+
34
+ ### **Live Demo (CPU-Only, thanks to Hugging Face Spaces)**
35
+
36
+ 🔗 **[https://huggingface.co/spaces/manu02/CXR-Findings-AI](https://huggingface.co/spaces/manu02/CXR-Findings-AI)**
37
+
38
+
39
+ ![App working](assets/app_view.png)
40
+
41
+ ---
42
+
43
+ # 🧠 Overview
44
+
45
+ **CXR-Findings-AI** is an interactive Gradio application that:
46
+
47
+ ### **Generates radiology findings from a chest X-ray image**
48
+
49
+ ### **Visualizes multimodal attention (image ↔ text) across layers and heads**
50
+
51
+ ### **Runs entirely on CPU**, showcasing the efficiency of the underlying 246M-parameter model
52
+
53
+ The system lets researchers, clinicians, and students explore **how different image regions influence each generated word**, enabling deeper interpretability in medical AI.
54
+
55
+ ---
56
+
57
+ # 🔍 What This App Provides
58
+
59
+ ### 🫁 **1. Findings Generation**
60
+
61
+ A lightweight multimodal model produces chest X-ray findings directly from the uploaded image.
62
+
63
+ ### 👁️ **2. Layer-wise & Head-wise Attention Visualization**
64
+
65
+ Inspect how the model distributes attention:
66
+
67
+ * Across **transformer layers**
68
+ * Across **attention heads**
69
+ * Between **image tokens** (32×32 grid → 1024 tokens)
70
+ * And **generated text tokens**
71
+
72
+ ### 🎨 **3. Three Synchronized Views**
73
+
74
+ For each selected word:
75
+
76
+ 1. **Original Image**
77
+ 2. **Overlay View:** Image + blended attention map
78
+ 3. **Pure Heatmap:** Visualizes raw attention intensities
79
+
80
+ ### 🧩 **4. Word-Level Interpretability**
81
+
82
+ Click any word in the generated report to reveal its cross-modal attention patterns.
83
+
84
+ ---
85
+
86
+ # 🚀 Quickstart (Local Usage)
87
+
88
+ ### 1) Clone
89
+
90
+ ```bash
91
+ git clone https://github.com/devMuniz02/Image-Attention-Visualizer
92
+ cd Image-Attention-Visualizer
93
+ ```
94
+
95
+ ### 2) (Optional) Create a virtual environment
96
+
97
+ **Windows:**
98
+
99
+ ```powershell
100
+ python -m venv venv
101
+ .\venv\Scripts\Activate.ps1
102
+ ```
103
+
104
+ **macOS / Linux:**
105
+
106
+ ```bash
107
+ python3 -m venv venv
108
+ source venv/bin/activate
109
+ ```
110
+
111
+ ### 3) Install requirements
112
+
113
+ ```bash
114
+ pip install -r requirements.txt
115
+ ```
116
+
117
+ ### 4) Run the app
118
+
119
+ ```bash
120
+ python app.py
121
+ ```
122
+
123
+ Then open:
124
+
125
+ ```
126
+ http://127.0.0.1:7860
127
+ ```
128
+
129
+ ---
130
+
131
+ # 🧭 How to Use the Interface
132
+
133
+ 1. **Upload a chest X-ray** (or load a sample)
134
+ 2. Adjust:
135
+
136
+ * Max new tokens
137
+ * Layer selection
138
+ * Head selection
139
+ * Or choose *mean* attention across all layers/heads
140
+ 3. Click **Generate Findings**
141
+ 4. Click any generated word to visualize:
142
+
143
+ * Image ↔ text attention
144
+ * Heatmaps
145
+ * Cross-token relationships
146
+
147
+ ---
148
+
149
+ # 🧩 Repository Structure
150
+
151
+ | File | Description |
152
+ | -------------------------------- | --------------------------------------------- |
153
+ | `app.py` | Main Gradio interface and visualization logic |
154
+ | `utils/models/complete_model.py` | Full multimodal model assembly |
155
+ | `utils/processing.py` | Image preprocessing |
156
+ | `assets/` | UI images & examples |
157
+ | `requirements.txt` | Dependencies |
158
+ | `README.md` | This file |
159
+
160
+ ---
161
+
162
+ # 🛠️ Troubleshooting
163
+
164
+ * **Blank heatmap** → Ensure `output_attentions=True` in `.generate()`
165
+ * **Distorted attention** → Check token count = 1024 (32×32)
166
+ * **Tokenizer errors** → Confirm `model.decoder.tokenizer` is loaded
167
+ * **OOM on local machine** → Reduce `max_new_tokens` or use CPU-only settings
168
+ * **Slow inference** → CPU mode is intentionally lightweight; GPU recommended for higher throughput
169
+
170
+ ---
171
+
172
+ # 🧪 Model Integration Notes
173
+
174
+ Compatible with any encoder–decoder or vision–language model that:
175
+
176
+ * Accepts `pixel_values`
177
+ * Returns attentions when calling
178
+
179
+ ```python
180
+ model.generate(..., output_attentions=True)
181
+ ```
182
+ * Provides a decoder tokenizer:
183
+
184
+ ```python
185
+ model.decoder.tokenizer
186
+ ```
187
+
188
+ Ideal for research in:
189
+
190
+ * Medical AI
191
+ * Vision–language alignment
192
+ * Cross-modal interpretability
193
+ * Attention visualization
194
+ * Explainable AI (XAI)
195
+
196
+ ---
197
+
198
+ # ❤️ Acknowledgments
199
+
200
+ * Powered by **Gradio** and **Hugging Face Transformers**
201
+ * Based on and expanded from the **Token-Attention-Viewer** project
202
+ 🔗 [https://github.com/devMuniz02/Image-Attention-Visualizer](https://github.com/devMuniz02/Image-Attention-Visualizer)
203
+ * Created as part of a thesis on **efficient and explainable multimodal medical AI**