Spaces:

manu02
/

CXR-Findings-AI

Sleeping

App Files Files Community

CXR-Findings-AI / README.md

manu02

Update README.md

2a8d9e3 verified 4 days ago

preview code

raw

history blame contribute delete

4.89 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: CXR-Findings-AI
emoji: 🫁
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
license: mit
pinned: true
tags:
  - gradio
  - pytorch
  - computer-vision
  - nlp
  - multimodal
  - vision-language
  - image-to-text
  - chest-xray
  - radiology
  - medical-ai
  - attention
  - attention-visualization
  - interpretability
  - explainable-ai
  - xai
  - cpu-inference
  - healthcare
  - demo
short_description: Generate chest X-ray findings and explore attention.

🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer

Live Demo (CPU-Only, thanks to Hugging Face Spaces)

🔗 https://huggingface.co/spaces/manu02/CXR-Findings-AI

🧠 Overview

CXR-Findings-AI is an interactive Gradio application that:

✅ Generates radiology findings from a chest X-ray image

✅ Visualizes multimodal attention (image ↔ text) across layers and heads

✅ Runs entirely on CPU, showcasing the efficiency of the underlying 246M-parameter model

The system lets researchers, clinicians, and students explore how different image regions influence each generated word, enabling deeper interpretability in medical AI.

🔍 What This App Provides

🫁 1. Findings Generation

A lightweight multimodal model produces chest X-ray findings directly from the uploaded image.

👁️ 2. Layer-wise & Head-wise Attention Visualization

Inspect how the model distributes attention:

Across transformer layers
Across attention heads
Between image tokens (32×32 grid → 1024 tokens)
And generated text tokens

🎨 3. Three Synchronized Views

For each selected word:

Original Image
Overlay View: Image + blended attention map
Pure Heatmap: Visualizes raw attention intensities

🧩 4. Word-Level Interpretability

Click any word in the generated report to reveal its cross-modal attention patterns.

🚀 Quickstart (Local Usage)

1) Clone

git clone https://github.com/devMuniz02/Image-Attention-Visualizer
cd Image-Attention-Visualizer

2) (Optional) Create a virtual environment

Windows:

python -m venv venv
.\venv\Scripts\Activate.ps1

macOS / Linux:

python3 -m venv venv
source venv/bin/activate

3) Install requirements

pip install -r requirements.txt

4) Run the app

python app.py

Then open:

http://127.0.0.1:7860

🧭 How to Use the Interface

Upload a chest X-ray (or load a sample)
Adjust:
- Max new tokens
- Layer selection
- Head selection
- Or choose mean attention across all layers/heads
Click Generate Findings
Click any generated word to visualize:
- Image ↔ text attention
- Heatmaps
- Cross-token relationships

🧩 Repository Structure

File	Description
`app.py`	Main Gradio interface and visualization logic
`utils/models/complete_model.py`	Full multimodal model assembly
`utils/processing.py`	Image preprocessing
`assets/`	UI images & examples
`requirements.txt`	Dependencies
`README.md`	This file

🛠️ Troubleshooting

Blank heatmap → Ensure output_attentions=True in .generate()
Distorted attention → Check token count = 1024 (32×32)
Tokenizer errors → Confirm model.decoder.tokenizer is loaded
OOM on local machine → Reduce max_new_tokens or use CPU-only settings
Slow inference → CPU mode is intentionally lightweight; GPU recommended for higher throughput

🧪 Model Integration Notes

Compatible with any encoder–decoder or vision–language model that:

Accepts pixel_values

Returns attentions when calling

model.generate(..., output_attentions=True)

Provides a decoder tokenizer:
```
model.decoder.tokenizer
```

Ideal for research in:

Medical AI
Vision–language alignment
Cross-modal interpretability
Attention visualization
Explainable AI (XAI)

❤️ Acknowledgments

Powered by Gradio and Hugging Face Transformers
Based on and expanded from the Token-Attention-Viewer project 🔗 https://github.com/devMuniz02/Image-Attention-Visualizer
Created as part of a thesis on efficient and explainable multimodal medical AI