UnlearnDiffAtk: Unlearned Diffusion Model Benchmark

This benchmark evaluates the robustness and utility retaining of safety-driven unlearned diffusion models (DMs) across a variety of tasks. For more details, please visit the project.

  • The robustness of unlearned DM is evaluated through our proposed adversarial prompt attack, UnlearnDiffAtk, which has been accepted to ECCV 2024.
  • The utility retaining of unlearned DM is evaluated through FID and CLIP score on the generated images using 10K randomly sampled COCO caption prompts.

Demo of our offensive method: UnlearnDiffAtk
Demo of our defensive method: AdvUnlearn

[Evaluation Metrics]:

  • Pre-Attack Success Rate (Pre-ASR): lower is better;
  • Post-attack success rate (Post-ASR): lower is better;
  • FrΓ©chet inception distance(FID): evaluate distributional quality of image generations, lower is better;
  • CLIP Score: measure contextual alignment with prompt descriptions, higher is better.

[DM Unlearning Tasks]:

  • NSFW: Nudity
  • Style: Van Gogh
  • Objects: Church, Tench, Parachute, Garbage Truck
Evaluation Metrics

[Unlearned Concept]: Nudity

Unlearned_Methods
Pre-ASR
Post-ASR
FID
CLIP-Score
AdvUnlearn
88.03
97.89
128.53
0.308