--- license: cc-by-4.0 ---
# SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion (IPCAI 2025) [![arXiv](https://img.shields.io/badge/arXiv-2502.07945-b31b1b.svg)](https://arxiv.org/abs/2502.07945) [![Paper](https://img.shields.io/badge/Paper-Visit-blue)](https://rdcu.be/em4E2) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/SsharvienKumar/SurGrID)
## 💡Key Features - We show that SGs can encode surgical scenes in a human-readable format. - We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned embeddings are employed to condition graph to image diffusion for high-quality and precisely controllable surgical simulation. - We evaluate our generative approach on scenes from cataract surgeries using quantitative fidelity and diversity measurements, followed by an extensive user study involving clinical experts ## 🛠 Setup ```bash git clone https://github.com/MECLabTUDA/SurGrID.git cd SurGrID conda create -n surgrid python=3.8.5 pip=20.3.3 conda activate surgrid pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt ``` ## 🏁 Model Checkpoints and Dataset Download the checkpoints of all the necessary models from the provided sources and place them in `[results](./results)`. We also provide the processed CADIS dataset, containing images, segmentation masks and their scene graphs. Update the paths of the dataset in `[configs](./configs)`. - `Checkpoints`: [VQGANs, GraphEncoders, Diffusion Model](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/checkpoints) - `Processed Dataset`: [CADIS](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/dataset) ## 💥 Sampling SurGrID ```bash python script/sampler_diffusion.py --conf configs/eval/eval_combined_emb.yaml ``` ## ⏳ Training SurGrID **Step 1:** Train Separate VQGAN for Image and Segmentation ```bash python surgrid/taming/main.py --base configs/vqgan/vqgan_image_cadis.yaml -t --gpus 0, python surgrid/taming/main.py --base configs/vqgan/vqgan_segmentation_cadis.yaml -t --gpus 0, ``` **Step 2:** Train Both Graph Encoder ```bash python script/trainer_graph.py --mode masked --conf configs/graph/graph_cadis.yaml python script/trainer_graph.py --mode segclip --conf configs/graph/graph_cadis.yaml ``` **Step 3:** Train Diffusion Model ```bash python script/trainer_diffusion.py --conf configs/trainer/combined_emb.yaml ``` ## 🔄 Training SurGrID on a New Dataset The files below needs to be adapted: - [Configs](./configs) - [SurGrID Dataset](./surgrid/dataset/cadis_dataset.py) - [VQGAN Dataset](./surgrid/taming/taming/data/cadis.py) - [CADIS Specifications in Graph Encoder Pre-training](./surgrid/graph/graph_masked_segclip.py) ## 🥼 Clinical Expert Assesment ```bash python script/demo_surgrid.py --conf configs/trainer/combined_emb.yaml ``` Our demo GUI allows for loading ground-truth graphs along with the ground-truth image. The graph’s nodes can be moved, deleted, or have their class changed. We instruct our participants to load four different ground-truth graphs and sequentially perform the following actions on each. They are requested to score the samples’ realism and coherence with the graph input using a Likert scale of 1 to 7: - First, participants are instructed to generate a batch of four samples from the groundtruth SG without modifications. - Second, the participants are requested to spatially move nodes in the canvas and again judge the synthesised samples. - Third, participants change the class of one of the instrument nodes and judge the generated images. - Lastly, participants are instructed to remove one of the instruments or miscellaneous classes and judge the synthesised image a final time.
Clinician Synthesisation from GT Spatial Modification Tool Modification Tool Removal
Realism Coherence Realism Coherence Realism Coherence Realism Coherence
P1 6.5±0.5 6.5±1.0 6.3±0.9 6.3±0.9 5.3±1.2 4.5±1.9 6.3±0.9 5.5±2.3
P2 5.3±0.9 5.3±0.5 4.5±0.5 4.3±2.0 5.3±0.9 5.8±0.9 5.5±1.2 5.5±1.9
P3 6.3±0.9 6.3±0.9 6.5±1.0 5.5±0.5 6.0±0.8 6.8±0.5 6.3±0.5 6.5±0.5
## 📜 Citations If you are using SurGrID for your paper, please cite the following paper: ``` @article{frisch2025surgrid, title={SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion}, author={Frisch, Yannik and Sivakumar, Ssharvien Kumar and K{\"o}ksal, {\c{C}}a{\u{g}}han and B{\"o}hm, Elsa and Wagner, Felix and Gericke, Adrian and Ghazaei, Ghazal and Mukhopadhyay, Anirban}, journal={arXiv preprint arXiv:2502.07945}, year={2025} } ``` ## ⭐ Acknowledgement Thanks for the following projects and theoretical works that we have either used or inspired from: - [VQGAN](https://github.com/CompVis/taming-transformers) - [Lucidrains' DDPM](https://github.com/lucidrains/denoising-diffusion-pytorch) - [SGDiff](https://github.com/YangLing0818/SGDiff) - [Endora's README](https://github.com/CUHK-AIM-Group/Endora)