Spaces:

manu02
/

CXR-Findings-AI

Sleeping

App Files Files Community

manu02 commited on 4 days ago

Commit

2a8d9e3

verified ·

1 Parent(s): 9ee9ac7

Update README.md

Browse files

Files changed (1) hide show

README.md +203 -160

README.md CHANGED Viewed

@@ -1,160 +1,203 @@
----
-title: Image-Attention-Visualizer
-emoji: 🔥
-colorFrom: blue
-colorTo: purple
-sdk: gradio
-app_file: app.py
-license: mit
-pinned: true
-tags:
-  - gradio
-  - pytorch
-  - computer-vision
-  - nlp
-  - multimodal
-  - vision-language
-  - image-to-text
-  - attention
-  - attention-visualization
-  - interpretability
-  - explainability
-  - xai
-  - demo
----
-# [Github repo](https://github.com/devMuniz02/Image-Attention-Visualizer)
-# [TRY IT NOW ON HUGGING FACE SPACES !!](https://huggingface.co/spaces/manu02/image-attention-visualizer)
-![App working](assets/app_view.png)
-# Image-Attention-Visualizer
-Image Attention Visualizer is an interactive Gradio app that visualizes **cross-modal attention** between image tokens and generated text tokens in a custom multimodal model. It allows researchers and developers to see how different parts of an image influence the model’s textual output, token by token.
-# Image-to-Text Attention Visualizer (Gradio)
-An interactive Gradio app to **generate text from an image using a custom multimodal model** and **visualize attention in real time**.
-It provides 3 synchronized views — original image, attention overlay, and heatmap — plus a **word-level visualization** showing how each generated word attends to visual regions.
----
-## ✨ What the app does
-* **Generates text** from an image input using your custom model (`create_complete_model`).
-* Displays **three synchronized views**:
-  1. 🖼️ **Original image**
-  2. 🔥 **Overlay** (original + attention heatmap)
-  3. 🌈 **Heatmap alone**
-* **Word-level attention viewer**: select any generated word to see how its attention is distributed across the image and previously generated words.
-* Works directly with your **custom tokenizer (`model.decoder.tokenizer`)**.
-* Fixed-length **1024 image tokens (32×32 grid)** projected as a visual heatmap.
-* Adjustable options: **Layer**, **Head**, or **Mean Across Layers/Heads**.
----
-## 🚀 Quickstart
-### 1) Clone
-```bash
-git clone https://github.com/devMuniz02/Image-Attention-Visualizer
-cd Image-Attention-Visualizer
-```
-### 2) (Optional) Create a virtual environment
-**Windows (PowerShell):**
-```powershell
-python -m venv venv
-.\venv\Scripts\Activate.ps1
-```
-**macOS / Linux (bash/zsh):**
-```bash
-python3 -m venv venv
-source venv/bin/activate
-```
-### 3) Install requirements
-```bash
-pip install -r requirements.txt
-```
-### 4) Run the app
-```bash
-python app.py
-```
-You should see something like:
-```
-Running on local URL:  http://127.0.0.1:7860
-```
-### 5) Open in your browser
-Navigate to `http://127.0.0.1:7860` to use the app.
----
-## 🧭 How to use
-1. **Upload an image** or load a random sample from your dataset folder.
-2. **Set generation parameters**:
-   * Max New Tokens
-   * Layer/Head selection (or average across all)
-3. Click **Generate** — the model will produce a textual description or continuation.
-4. **Select a generated word** from the list:
-   * The top row will show:
-     * Left → **Original image**
-     * Center → **Overlay (attention on image regions)**
-     * Right → **Colored heatmap**
-   * The bottom section highlights attention strength over the generated words.
----
-## 🧩 Files
-* `app.py` — Main Gradio interface and visualization logic.
-* `utils/models/complete_model.py` — Model definition and generation method.
-* `utils/processing.py` — Image preprocessing utilities.
-* `requirements.txt` — Dependencies.
-* `README.md` — This file.
----
-## 🛠️ Troubleshooting
-* **Black or blank heatmap:** Ensure your model returns `output_attentions=True` in `.generate()`.
-* **Low resolution or distortion:** Adjust `img_size` or the interpolation method inside `_make_overlay`.
-* **Tokenizer error:** Make sure `model.decoder.tokenizer` exists and is loaded correctly.
-* **OOM errors:** Reduce `max_new_tokens` or use a smaller model checkpoint.
-* **Color or shape mismatch:** Verify that your image tokens length = 1024 (for a 32×32 layout).
----
-## 🧪 Model integration notes
-* The app is compatible with any **encoder–decoder or vision–language model** that:
-  * Accepts `pixel_values` as input.
-  * Returns `generate(..., output_attentions=True)` with `(gen_ids, gen_text, attentions)`.
-* Uses the tokenizer from `model.decoder.tokenizer`.
-* Designed for research in **vision-language interpretability**, **cross-modal explainability**, and **attention visualization**.
----
-## 📣 Acknowledgments
-* Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers).
-* Inspired by the original [Token-Attention-Viewer](https://github.com/devMuniz02/Token-Attention-Viewer) project.
-* Special thanks to the open-source community advancing **vision-language interpretability**.

+---
+title: CXR-Findings-AI
+emoji: 🫁
+colorFrom: blue
+colorTo: indigo
+sdk: gradio
+app_file: app.py
+license: mit
+pinned: true
+tags:
+- gradio
+- pytorch
+- computer-vision
+- nlp
+- multimodal
+- vision-language
+- image-to-text
+- chest-xray
+- radiology
+- medical-ai
+- attention
+- attention-visualization
+- interpretability
+- explainable-ai
+- xai
+- cpu-inference
+- healthcare
+- demo
+short_description: Generate chest X-ray findings and explore attention.
+---
+# 🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer
+### **Live Demo (CPU-Only, thanks to Hugging Face Spaces)**
+🔗 **[https://huggingface.co/spaces/manu02/CXR-Findings-AI](https://huggingface.co/spaces/manu02/CXR-Findings-AI)**
+![App working](assets/app_view.png)
+---
+# 🧠 Overview
+**CXR-Findings-AI** is an interactive Gradio application that:
+### ✅ **Generates radiology findings from a chest X-ray image**
+### ✅ **Visualizes multimodal attention (image ↔ text) across layers and heads**
+### ✅ **Runs entirely on CPU**, showcasing the efficiency of the underlying 246M-parameter model
+The system lets researchers, clinicians, and students explore **how different image regions influence each generated word**, enabling deeper interpretability in medical AI.
+---
+# 🔍 What This App Provides
+### 🫁 **1. Findings Generation**
+A lightweight multimodal model produces chest X-ray findings directly from the uploaded image.
+### 👁️ **2. Layer-wise & Head-wise Attention Visualization**
+Inspect how the model distributes attention:
+* Across **transformer layers**
+* Across **attention heads**
+* Between **image tokens** (32×32 grid → 1024 tokens)
+* And **generated text tokens**
+### 🎨 **3. Three Synchronized Views**
+For each selected word:
+1. **Original Image**
+2. **Overlay View:** Image + blended attention map
+3. **Pure Heatmap:** Visualizes raw attention intensities
+### 🧩 **4. Word-Level Interpretability**
+Click any word in the generated report to reveal its cross-modal attention patterns.
+---
+# 🚀 Quickstart (Local Usage)
+### 1) Clone
+```bash
+git clone https://github.com/devMuniz02/Image-Attention-Visualizer
+cd Image-Attention-Visualizer
+```
+### 2) (Optional) Create a virtual environment
+**Windows:**
+```powershell
+python -m venv venv
+.\venv\Scripts\Activate.ps1
+```
+**macOS / Linux:**
+```bash
+python3 -m venv venv
+source venv/bin/activate
+```
+### 3) Install requirements
+```bash
+pip install -r requirements.txt
+```
+### 4) Run the app
+```bash
+python app.py
+```
+Then open:
+```
+http://127.0.0.1:7860
+```
+---
+# 🧭 How to Use the Interface
+1. **Upload a chest X-ray** (or load a sample)
+2. Adjust:
+   * Max new tokens
+   * Layer selection
+   * Head selection
+   * Or choose *mean* attention across all layers/heads
+3. Click **Generate Findings**
+4. Click any generated word to visualize:
+   * Image ↔ text attention
+   * Heatmaps
+   * Cross-token relationships
+---
+# 🧩 Repository Structure
+| File                             | Description                                   |
+| -------------------------------- | --------------------------------------------- |
+| `app.py`                         | Main Gradio interface and visualization logic |
+| `utils/models/complete_model.py` | Full multimodal model assembly                |
+| `utils/processing.py`            | Image preprocessing                           |
+| `assets/`                        | UI images & examples                          |
+| `requirements.txt`               | Dependencies                                  |
+| `README.md`                      | This file                                     |
+---
+# 🛠️ Troubleshooting
+* **Blank heatmap** → Ensure `output_attentions=True` in `.generate()`
+* **Distorted attention** → Check token count = 1024 (32×32)
+* **Tokenizer errors** → Confirm `model.decoder.tokenizer` is loaded
+* **OOM on local machine** → Reduce `max_new_tokens` or use CPU-only settings
+* **Slow inference** → CPU mode is intentionally lightweight; GPU recommended for higher throughput
+---
+# 🧪 Model Integration Notes
+Compatible with any encoder–decoder or vision–language model that:
+* Accepts `pixel_values`
+* Returns attentions when calling
+  ```python
+  model.generate(..., output_attentions=True)
+  ```
+* Provides a decoder tokenizer:
+  ```python
+  model.decoder.tokenizer
+  ```
+Ideal for research in:
+* Medical AI
+* Vision–language alignment
+* Cross-modal interpretability
+* Attention visualization
+* Explainable AI (XAI)
+---
+# ❤️ Acknowledgments
+* Powered by **Gradio** and **Hugging Face Transformers**
+* Based on and expanded from the **Token-Attention-Viewer** project
+🔗 [https://github.com/devMuniz02/Image-Attention-Visualizer](https://github.com/devMuniz02/Image-Attention-Visualizer)
+* Created as part of a thesis on **efficient and explainable multimodal medical AI**