Spaces:

manu02
/

CXR-Findings-AI

Sleeping

App Files Files Community

CXR-Findings-AI / README.md

manu02

Update README.md

2a8d9e3 verified 4 days ago

preview code

raw

history blame contribute delete

4.89 kB

	---
	title: CXR-Findings-AI
	emoji: 🫁
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	app_file: app.py
	license: mit
	pinned: true
	tags:
	- gradio
	- pytorch
	- computer-vision
	- nlp
	- multimodal
	- vision-language
	- image-to-text
	- chest-xray
	- radiology
	- medical-ai
	- attention
	- attention-visualization
	- interpretability
	- explainable-ai
	- xai
	- cpu-inference
	- healthcare
	- demo
	short_description: Generate chest X-ray findings and explore attention.
	---

	# 🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer

	### Live Demo (CPU-Only, thanks to Hugging Face Spaces)

	🔗 [https://huggingface.co/spaces/manu02/CXR-Findings-AI](https://huggingface.co/spaces/manu02/CXR-Findings-AI)


	![App working](assets/app_view.png)

	---

	# 🧠 Overview

	CXR-Findings-AI is an interactive Gradio application that:

	### ✅ Generates radiology findings from a chest X-ray image

	### ✅ Visualizes multimodal attention (image ↔ text) across layers and heads

	### ✅ Runs entirely on CPU, showcasing the efficiency of the underlying 246M-parameter model

	The system lets researchers, clinicians, and students explore how different image regions influence each generated word, enabling deeper interpretability in medical AI.

	---

	# 🔍 What This App Provides

	### 🫁 1. Findings Generation

	A lightweight multimodal model produces chest X-ray findings directly from the uploaded image.

	### 👁️ 2. Layer-wise & Head-wise Attention Visualization

	Inspect how the model distributes attention:

	* Across transformer layers
	* Across attention heads
	* Between image tokens (32×32 grid → 1024 tokens)
	* And generated text tokens

	### 🎨 3. Three Synchronized Views

	For each selected word:

	1. Original Image
	2. Overlay View: Image + blended attention map
	3. Pure Heatmap: Visualizes raw attention intensities

	### 🧩 4. Word-Level Interpretability

	Click any word in the generated report to reveal its cross-modal attention patterns.

	---

	# 🚀 Quickstart (Local Usage)

	### 1) Clone

	```bash
	git clone https://github.com/devMuniz02/Image-Attention-Visualizer
	cd Image-Attention-Visualizer
	```

	### 2) (Optional) Create a virtual environment

	Windows:

	```powershell
	python -m venv venv
	.\venv\Scripts\Activate.ps1
	```

	macOS / Linux:

	```bash
	python3 -m venv venv
	source venv/bin/activate
	```

	### 3) Install requirements

	```bash
	pip install -r requirements.txt
	```

	### 4) Run the app

	```bash
	python app.py
	```

	Then open:

	```
	http://127.0.0.1:7860
	```

	---

	# 🧭 How to Use the Interface

	1. Upload a chest X-ray (or load a sample)
	2. Adjust:

	* Max new tokens
	* Layer selection
	* Head selection
	* Or choose mean attention across all layers/heads
	3. Click Generate Findings
	4. Click any generated word to visualize:

	* Image ↔ text attention
	* Heatmaps
	* Cross-token relationships

	---

	# 🧩 Repository Structure

	\| File \| Description \|
	\| -------------------------------- \| --------------------------------------------- \|
	\| `app.py` \| Main Gradio interface and visualization logic \|
	\| `utils/models/complete_model.py` \| Full multimodal model assembly \|
	\| `utils/processing.py` \| Image preprocessing \|
	\| `assets/` \| UI images & examples \|
	\| `requirements.txt` \| Dependencies \|
	\| `README.md` \| This file \|

	---

	# 🛠️ Troubleshooting

	* Blank heatmap → Ensure `output_attentions=True` in `.generate()`
	* Distorted attention → Check token count = 1024 (32×32)
	* Tokenizer errors → Confirm `model.decoder.tokenizer` is loaded
	* OOM on local machine → Reduce `max_new_tokens` or use CPU-only settings
	* Slow inference → CPU mode is intentionally lightweight; GPU recommended for higher throughput

	---

	# 🧪 Model Integration Notes

	Compatible with any encoder–decoder or vision–language model that:

	* Accepts `pixel_values`
	* Returns attentions when calling

	```python
	model.generate(..., output_attentions=True)
	```
	* Provides a decoder tokenizer:

	```python
	model.decoder.tokenizer
	```

	Ideal for research in:

	* Medical AI
	* Vision–language alignment
	* Cross-modal interpretability
	* Attention visualization
	* Explainable AI (XAI)

	---

	# ❤️ Acknowledgments

	* Powered by Gradio and Hugging Face Transformers
	* Based on and expanded from the Token-Attention-Viewer project
	🔗 [https://github.com/devMuniz02/Image-Attention-Visualizer](https://github.com/devMuniz02/Image-Attention-Visualizer)
	* Created as part of a thesis on efficient and explainable multimodal medical AI