Spaces:

manu02
/

CXR-Findings-AI

Sleeping

App Files Files Community

manu02 commited on Nov 4

Commit

110fd76

1 Parent(s): b4d2ee0

Upload 3 files

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +160 -14
requirements.txt +6 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/app_view.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,14 +1,160 @@
----
-title: Image Attention Visualizer
-emoji: 🏢
-colorFrom: yellow
-colorTo: yellow
-sdk: gradio
-sdk_version: 5.49.1
-app_file: app.py
-pinned: false
-license: mit
-short_description: Image Attention Visualizer is an interactive Gradio app that
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Image-Attention-Visualizer
+emoji: 🔥
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+app_file: app.py
+license: mit
+pinned: true
+tags:
+  - gradio
+  - pytorch
+  - computer-vision
+  - nlp
+  - multimodal
+  - vision-language
+  - image-to-text
+  - attention
+  - attention-visualization
+  - interpretability
+  - explainability
+  - xai
+  - demo
+---
+# [Github repo](https://github.com/devMuniz02/Image-Attention-Visualizer)
+# [TRY IT NOW ON HUGGING FACE SPACES !!](https://huggingface.co/spaces/manu02/image-attention-visualizer)
+![App working](assets/app_view.png)
+# Image-Attention-Visualizer
+Image Attention Visualizer is an interactive Gradio app that visualizes **cross-modal attention** between image tokens and generated text tokens in a custom multimodal model. It allows researchers and developers to see how different parts of an image influence the model’s textual output, token by token.
+# Image-to-Text Attention Visualizer (Gradio)
+An interactive Gradio app to **generate text from an image using a custom multimodal model** and **visualize attention in real time**.
+It provides 3 synchronized views — original image, attention overlay, and heatmap — plus a **word-level visualization** showing how each generated word attends to visual regions.
+---
+## ✨ What the app does
+* **Generates text** from an image input using your custom model (`create_complete_model`).
+* Displays **three synchronized views**:
+  1. 🖼️ **Original image**
+  2. 🔥 **Overlay** (original + attention heatmap)
+  3. 🌈 **Heatmap alone**
+* **Word-level attention viewer**: select any generated word to see how its attention is distributed across the image and previously generated words.
+* Works directly with your **custom tokenizer (`model.decoder.tokenizer`)**.
+* Fixed-length **1024 image tokens (32×32 grid)** projected as a visual heatmap.
+* Adjustable options: **Layer**, **Head**, or **Mean Across Layers/Heads**.
+---
+## 🚀 Quickstart
+### 1) Clone
+```bash
+git clone https://github.com/devMuniz02/Image-Attention-Visualizer
+cd Image-Attention-Visualizer
+```
+### 2) (Optional) Create a virtual environment
+**Windows (PowerShell):**
+```powershell
+python -m venv venv
+.\venv\Scripts\Activate.ps1
+```
+**macOS / Linux (bash/zsh):**
+```bash
+python3 -m venv venv
+source venv/bin/activate
+```
+### 3) Install requirements
+```bash
+pip install -r requirements.txt
+```
+### 4) Run the app
+```bash
+python app.py
+```
+You should see something like:
+```
+Running on local URL:  http://127.0.0.1:7860
+```
+### 5) Open in your browser
+Navigate to `http://127.0.0.1:7860` to use the app.
+---
+## 🧭 How to use
+1. **Upload an image** or load a random sample from your dataset folder.
+2. **Set generation parameters**:
+   * Max New Tokens
+   * Layer/Head selection (or average across all)
+3. Click **Generate** — the model will produce a textual description or continuation.
+4. **Select a generated word** from the list:
+   * The top row will show:
+     * Left → **Original image**
+     * Center → **Overlay (attention on image regions)**
+     * Right → **Colored heatmap**
+   * The bottom section highlights attention strength over the generated words.
+---
+## 🧩 Files
+* `app.py` — Main Gradio interface and visualization logic.
+* `utils/models/complete_model.py` — Model definition and generation method.
+* `utils/processing.py` — Image preprocessing utilities.
+* `requirements.txt` — Dependencies.
+* `README.md` — This file.
+---
+## 🛠️ Troubleshooting
+* **Black or blank heatmap:** Ensure your model returns `output_attentions=True` in `.generate()`.
+* **Low resolution or distortion:** Adjust `img_size` or the interpolation method inside `_make_overlay`.
+* **Tokenizer error:** Make sure `model.decoder.tokenizer` exists and is loaded correctly.
+* **OOM errors:** Reduce `max_new_tokens` or use a smaller model checkpoint.
+* **Color or shape mismatch:** Verify that your image tokens length = 1024 (for a 32×32 layout).
+---
+## 🧪 Model integration notes
+* The app is compatible with any **encoder–decoder or vision–language model** that:
+  * Accepts `pixel_values` as input.
+  * Returns `generate(..., output_attentions=True)` with `(gen_ids, gen_text, attentions)`.
+* Uses the tokenizer from `model.decoder.tokenizer`.
+* Designed for research in **vision-language interpretability**, **cross-modal explainability**, and **attention visualization**.
+---
+## 📣 Acknowledgments
+* Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers).
+* Inspired by the original [Token-Attention-Viewer](https://github.com/devMuniz02/Token-Attention-Viewer) project.
+* Special thanks to the open-source community advancing **vision-language interpretability**.

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+transformers
+gradio
+torch
+torchvision
+fsspec
+matplotlib