manu02 commited on
Commit
110fd76
·
1 Parent(s): b4d2ee0

Upload 3 files

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +160 -14
  3. requirements.txt +6 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/app_view.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,14 +1,160 @@
1
- ---
2
- title: Image Attention Visualizer
3
- emoji: 🏢
4
- colorFrom: yellow
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 5.49.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: Image Attention Visualizer is an interactive Gradio app that
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Image-Attention-Visualizer
3
+ emoji: 🔥
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ app_file: app.py
8
+ license: mit
9
+ pinned: true
10
+ tags:
11
+ - gradio
12
+ - pytorch
13
+ - computer-vision
14
+ - nlp
15
+ - multimodal
16
+ - vision-language
17
+ - image-to-text
18
+ - attention
19
+ - attention-visualization
20
+ - interpretability
21
+ - explainability
22
+ - xai
23
+ - demo
24
+ ---
25
+
26
+ # [Github repo](https://github.com/devMuniz02/Image-Attention-Visualizer)
27
+ # [TRY IT NOW ON HUGGING FACE SPACES !!](https://huggingface.co/spaces/manu02/image-attention-visualizer)
28
+
29
+ ![App working](assets/app_view.png)
30
+
31
+ # Image-Attention-Visualizer
32
+
33
+ Image Attention Visualizer is an interactive Gradio app that visualizes **cross-modal attention** between image tokens and generated text tokens in a custom multimodal model. It allows researchers and developers to see how different parts of an image influence the model’s textual output, token by token.
34
+
35
+ # Image-to-Text Attention Visualizer (Gradio)
36
+
37
+ An interactive Gradio app to **generate text from an image using a custom multimodal model** and **visualize attention in real time**.
38
+ It provides 3 synchronized views — original image, attention overlay, and heatmap — plus a **word-level visualization** showing how each generated word attends to visual regions.
39
+
40
+ ---
41
+
42
+ ## ✨ What the app does
43
+
44
+ * **Generates text** from an image input using your custom model (`create_complete_model`).
45
+ * Displays **three synchronized views**:
46
+
47
+ 1. 🖼️ **Original image**
48
+ 2. 🔥 **Overlay** (original + attention heatmap)
49
+ 3. 🌈 **Heatmap alone**
50
+ * **Word-level attention viewer**: select any generated word to see how its attention is distributed across the image and previously generated words.
51
+ * Works directly with your **custom tokenizer (`model.decoder.tokenizer`)**.
52
+ * Fixed-length **1024 image tokens (32×32 grid)** projected as a visual heatmap.
53
+ * Adjustable options: **Layer**, **Head**, or **Mean Across Layers/Heads**.
54
+
55
+ ---
56
+
57
+ ## 🚀 Quickstart
58
+
59
+ ### 1) Clone
60
+
61
+ ```bash
62
+ git clone https://github.com/devMuniz02/Image-Attention-Visualizer
63
+ cd Image-Attention-Visualizer
64
+ ```
65
+
66
+ ### 2) (Optional) Create a virtual environment
67
+
68
+ **Windows (PowerShell):**
69
+
70
+ ```powershell
71
+ python -m venv venv
72
+ .\venv\Scripts\Activate.ps1
73
+ ```
74
+
75
+ **macOS / Linux (bash/zsh):**
76
+
77
+ ```bash
78
+ python3 -m venv venv
79
+ source venv/bin/activate
80
+ ```
81
+
82
+ ### 3) Install requirements
83
+
84
+ ```bash
85
+ pip install -r requirements.txt
86
+ ```
87
+
88
+ ### 4) Run the app
89
+
90
+ ```bash
91
+ python app.py
92
+ ```
93
+
94
+ You should see something like:
95
+
96
+ ```
97
+ Running on local URL: http://127.0.0.1:7860
98
+ ```
99
+
100
+ ### 5) Open in your browser
101
+
102
+ Navigate to `http://127.0.0.1:7860` to use the app.
103
+
104
+ ---
105
+
106
+ ## 🧭 How to use
107
+
108
+ 1. **Upload an image** or load a random sample from your dataset folder.
109
+ 2. **Set generation parameters**:
110
+
111
+ * Max New Tokens
112
+ * Layer/Head selection (or average across all)
113
+ 3. Click **Generate** — the model will produce a textual description or continuation.
114
+ 4. **Select a generated word** from the list:
115
+
116
+ * The top row will show:
117
+
118
+ * Left → **Original image**
119
+ * Center → **Overlay (attention on image regions)**
120
+ * Right → **Colored heatmap**
121
+ * The bottom section highlights attention strength over the generated words.
122
+
123
+ ---
124
+
125
+ ## 🧩 Files
126
+
127
+ * `app.py` — Main Gradio interface and visualization logic.
128
+ * `utils/models/complete_model.py` — Model definition and generation method.
129
+ * `utils/processing.py` — Image preprocessing utilities.
130
+ * `requirements.txt` — Dependencies.
131
+ * `README.md` — This file.
132
+
133
+ ---
134
+
135
+ ## 🛠️ Troubleshooting
136
+
137
+ * **Black or blank heatmap:** Ensure your model returns `output_attentions=True` in `.generate()`.
138
+ * **Low resolution or distortion:** Adjust `img_size` or the interpolation method inside `_make_overlay`.
139
+ * **Tokenizer error:** Make sure `model.decoder.tokenizer` exists and is loaded correctly.
140
+ * **OOM errors:** Reduce `max_new_tokens` or use a smaller model checkpoint.
141
+ * **Color or shape mismatch:** Verify that your image tokens length = 1024 (for a 32×32 layout).
142
+
143
+ ---
144
+
145
+ ## 🧪 Model integration notes
146
+
147
+ * The app is compatible with any **encoder–decoder or vision–language model** that:
148
+
149
+ * Accepts `pixel_values` as input.
150
+ * Returns `generate(..., output_attentions=True)` with `(gen_ids, gen_text, attentions)`.
151
+ * Uses the tokenizer from `model.decoder.tokenizer`.
152
+ * Designed for research in **vision-language interpretability**, **cross-modal explainability**, and **attention visualization**.
153
+
154
+ ---
155
+
156
+ ## 📣 Acknowledgments
157
+
158
+ * Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers).
159
+ * Inspired by the original [Token-Attention-Viewer](https://github.com/devMuniz02/Token-Attention-Viewer) project.
160
+ * Special thanks to the open-source community advancing **vision-language interpretability**.
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ transformers
2
+ gradio
3
+ torch
4
+ torchvision
5
+ fsspec
6
+ matplotlib