CXR-Findings-AI / README.md
manu02's picture
Update README.md
2a8d9e3 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: CXR-Findings-AI
emoji: 🫁
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
license: mit
pinned: true
tags:
  - gradio
  - pytorch
  - computer-vision
  - nlp
  - multimodal
  - vision-language
  - image-to-text
  - chest-xray
  - radiology
  - medical-ai
  - attention
  - attention-visualization
  - interpretability
  - explainable-ai
  - xai
  - cpu-inference
  - healthcare
  - demo
short_description: Generate chest X-ray findings and explore attention.

🫁 CXR-Findings-AI — Chest X-Ray Findings Generator + Attention Explorer

Live Demo (CPU-Only, thanks to Hugging Face Spaces)

🔗 https://huggingface.co/spaces/manu02/CXR-Findings-AI

App working


🧠 Overview

CXR-Findings-AI is an interactive Gradio application that:

Generates radiology findings from a chest X-ray image

Visualizes multimodal attention (image ↔ text) across layers and heads

Runs entirely on CPU, showcasing the efficiency of the underlying 246M-parameter model

The system lets researchers, clinicians, and students explore how different image regions influence each generated word, enabling deeper interpretability in medical AI.


🔍 What This App Provides

🫁 1. Findings Generation

A lightweight multimodal model produces chest X-ray findings directly from the uploaded image.

👁️ 2. Layer-wise & Head-wise Attention Visualization

Inspect how the model distributes attention:

  • Across transformer layers
  • Across attention heads
  • Between image tokens (32×32 grid → 1024 tokens)
  • And generated text tokens

🎨 3. Three Synchronized Views

For each selected word:

  1. Original Image
  2. Overlay View: Image + blended attention map
  3. Pure Heatmap: Visualizes raw attention intensities

🧩 4. Word-Level Interpretability

Click any word in the generated report to reveal its cross-modal attention patterns.


🚀 Quickstart (Local Usage)

1) Clone

git clone https://github.com/devMuniz02/Image-Attention-Visualizer
cd Image-Attention-Visualizer

2) (Optional) Create a virtual environment

Windows:

python -m venv venv
.\venv\Scripts\Activate.ps1

macOS / Linux:

python3 -m venv venv
source venv/bin/activate

3) Install requirements

pip install -r requirements.txt

4) Run the app

python app.py

Then open:

http://127.0.0.1:7860

🧭 How to Use the Interface

  1. Upload a chest X-ray (or load a sample)

  2. Adjust:

    • Max new tokens
    • Layer selection
    • Head selection
    • Or choose mean attention across all layers/heads
  3. Click Generate Findings

  4. Click any generated word to visualize:

    • Image ↔ text attention
    • Heatmaps
    • Cross-token relationships

🧩 Repository Structure

File Description
app.py Main Gradio interface and visualization logic
utils/models/complete_model.py Full multimodal model assembly
utils/processing.py Image preprocessing
assets/ UI images & examples
requirements.txt Dependencies
README.md This file

🛠️ Troubleshooting

  • Blank heatmap → Ensure output_attentions=True in .generate()
  • Distorted attention → Check token count = 1024 (32×32)
  • Tokenizer errors → Confirm model.decoder.tokenizer is loaded
  • OOM on local machine → Reduce max_new_tokens or use CPU-only settings
  • Slow inference → CPU mode is intentionally lightweight; GPU recommended for higher throughput

🧪 Model Integration Notes

Compatible with any encoder–decoder or vision–language model that:

  • Accepts pixel_values

  • Returns attentions when calling

    model.generate(..., output_attentions=True)
    
  • Provides a decoder tokenizer:

    model.decoder.tokenizer
    

Ideal for research in:

  • Medical AI
  • Vision–language alignment
  • Cross-modal interpretability
  • Attention visualization
  • Explainable AI (XAI)

❤️ Acknowledgments