ntk1507 nielsr HF Staff commited on
Commit
fb9851e
ยท
verified ยท
1 Parent(s): 2283326

Add model card (#2)

Browse files

- Add model card (c7817644b60c1ac1092ca2fdffb0841b025f8c7b)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +22 -53
README.md CHANGED
@@ -1,74 +1,43 @@
1
  ---
2
  license: apache-2.0
3
- pipeline_tag: text-to-image
4
  library_name: diffusers
 
5
  ---
6
 
7
- # Universal Few-Shot Spatial Control for Diffusion Models
8
-
9
- This repository contains the official implementation of **Universal Few-Shot Spatial Control for Diffusion Models (UFC)**, as presented in the paper [Universal Few-Shot Spatial Control for Diffusion Models](https://huggingface.co/papers/2509.07530).
10
 
11
- For the official code and usage instructions, please refer to the [GitHub repository](https://github.com/kietngt00/UFC).
12
 
13
- ![Results Visualization](https://github.com/kietngt00/UFC/raw/main/assets/results.png)
14
 
15
- *Figure 1: Results of our method (**UNet**) learned with **30 examples** on **unseen** spatial conditions. The proposed control adapter guides the pre-trained T2I models in a versatile and data-efficient manner.*
16
-
17
- ## ๐Ÿš€ Introduction
18
- This repository contains the official implementation of **Universal Few-Shot Spatial Control for Diffusion Models (UFC)**.
19
-
20
- **UFC** is a versatile few-shot control adapter capable of generalizing to novel spatial conditions, thereby enabling fine-grained control over the structure of generated images. Our method is applicable to both UNet and DiT diffusion backbones.
21
 
22
  ## Abstract
23
- > Spatial conditioning in pretrained text-to-image diffusion models has significantly improved fine-grained control over the structure of generated images. However, existing control adapters exhibit limited adaptability and incur high training costs when encountering novel spatial control conditions that differ substantially from the training tasks. To address this limitation, we propose Universal Few-Shot Control (UFC), a versatile few-shot control adapter capable of generalizing to novel spatial conditions. Given a few image-condition pairs of an unseen task and a query condition, UFC leverages the analogy between query and support conditions to construct task-specific control features, instantiated by a matching mechanism and an update on a small set of task-specific parameters. Experiments on six novel spatial control tasks show that UFC, fine-tuned with only 30 annotated examples of novel tasks, achieves fine-grained control consistent with the spatial conditions. Notably, when fine-tuned with 0.1% of the full training data, UFC achieves competitive performance with the fully supervised baselines in various control tasks. We also show that UFC is applicable agnostically to various diffusion backbones and demonstrate its effectiveness on both UNet and DiT architectures.
24
 
25
- ## ๐Ÿ’ก Method
26
- ![System Architecture](https://github.com/kietngt00/UFC/raw/main/assets/architecture.png)
 
27
 
28
- ## ๐Ÿ“ Model Checkpoints
29
 
30
- #### Checkpoints using **UNet** backbone
31
 
32
  | Few-shot Task | Few-shot (30-shot) Fine-tuned Model | Base Meta-trained Model | Description |
33
  |:-------------:|:-----------------------------------:|:-----------------------:|:-----------:|
34
- | `Canny` | [UNet_canny](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr23_canny_30) | [UNet_taskgr23](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr23) | The base model is trained with 4 tasks: `[Depth, Normal, Pose, Densepose]`|
35
- | `Hed` | [UNet_hed](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr23_hed_30) | [UNet_taskgr23](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr23) | The base model is trained with 4 tasks: `[Depth, Normal, Pose, Densepose]`|
36
- | `Depth` | [UNet_depth](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr13_depth_30) | [UNet_taskgr13](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr13) | The base model is trained with 4 tasks: `[Canny, HED, Pose, Densepose]`|
37
- | `Normal` | [UNet_normal](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr13_normal_30) | [UNet_taskgr13](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr13) | The base model is trained with 4 tasks: `[Canny, HED, Pose, Densepose]`|
38
- | `Pose` | [UNet_pose](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr12_pose_30) | [UNet_taskgr12](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr12) | The base model is trained with 4 tasks: `[Canny, HED, Depth, Normal]`|
39
  |`Densepose`| [UNet_densepose](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr12_densepose_30) | [UNet_taskgr12](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr12) | The base model is trained with 4 tasks: `[Canny, HED, Depth, Normal]`|
40
 
41
- #### Checkpoints using **DiT** backbone
42
 
43
  | Few-shot Task | Few-shot (30-shot) Fine-tuned Model | Base Meta-trained Model | Description |
44
  |:-------------:|:-----------------------------------:|:-----------------------:|:-----------:|
45
- | `Canny` | [DiT_canny](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr23_canny) | [DiT_taskgr23](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr23) | The base model is trained with 4 tasks: `[Depth, Normal, Pose, Densepose]`|
46
- | `Hed` | [DiT_hed](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr23_hed) | [DiT_taskgr23](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr23) | The base model is trained with 4 tasks: `[Depth, Normal, Pose, Densepose]`|
47
- | `Depth` | [DiT_depth](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr13_depth) | [DiT_taskgr13](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr13) | The base model is trained with 4 tasks: `[Canny, HED, Pose, Densepose]`|
48
- | `Normal` | [DiT_normal](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr13_normal) | [DiT_taskgr13](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr13) | The base model is trained with 4 tasks: `[Canny, HED, Pose, Densepose]`|
49
- | `Pose` | [DiT_pose](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr12_pose) | [DiT_taskgr12](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr12) | The base model is trained with 4 tasks: `[Canny, HED, Depth, Normal]`|
50
- |`Densepose`| [DiT_densepose](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr12_densepose) | [DiT_taskgr12](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr12) | The base model is trained with 4 tasks: `[Canny, HED, Depth, Normal]`|
51
-
52
- ## ๐Ÿ–ผ๏ธ Image Generation
53
-
54
- Script for UFC with **UNet** backbone:
55
- ```bash
56
- PYTHONPATH=. python eval/UNet_generation.py \
57
- --config </path/to/config> \
58
- --ckpt_path </path/to/meta_train_checkpoint> \
59
- --task_ckpt_path </path/to/finetune_checkpoint> \
60
- --task <task> --shots 5 --batch_size 8 \
61
- ```
62
-
63
- Script for UFC with **DiT** backbone is similar, but replacing `UNet_generation.py` with `DiT_generation.py`.
64
-
65
- ## ๐Ÿ™ Acknowledgements
66
- We develop our method based on the [diffusers](https://github.com/huggingface/diffusers) library, the official code of [OminiControl](https://github.com/Yuanshi9815/OminiControl/tree/main), [VTM](https://github.com/GitGyun/visual_token_matching) and [ControlNet](https://github.com/lllyasviel/ControlNet). We gratefully acknowledge the authors for making their code publicly available.
67
-
68
- ## ๐Ÿ“– Citation
69
- If you find our work useful, please consider citing our paper:
70
-
71
- ```bibtex
72
- @article{
73
- }
74
- ```
 
1
  ---
2
  license: apache-2.0
 
3
  library_name: diffusers
4
+ pipeline_tag: text-to-image
5
  ---
6
 
7
+ # Universal Few-Shot Spatial Control for Diffusion Models (UFC)
 
 
8
 
9
+ This repository presents **Universal Few-Shot Spatial Control for Diffusion Models (UFC)**, a versatile few-shot control adapter for generalizing to novel spatial conditions in text-to-image diffusion models. Our method is applicable to both UNet and DiT diffusion backbones.
10
 
11
+ The model was presented in the paper [Universal Few-Shot Spatial Control for Diffusion Models](https://huggingface.co/papers/2509.07530).
12
 
13
+ Official code and more details can be found at the [GitHub repository](https://github.com/kietngt00/UFC).
 
 
 
 
 
14
 
15
  ## Abstract
 
16
 
17
+ Spatial conditioning in pretrained text-to-image diffusion models has significantly improved fine-grained control over the structure of generated images. However, existing control adapters exhibit limited adaptability and incur high training costs when encountering novel spatial control conditions that differ substantially from the training tasks. To address this limitation, we propose Universal Few-Shot Control (UFC), a versatile few-shot control adapter capable of generalizing to novel spatial conditions. Given a few image-condition pairs of an unseen task and a query condition, UFC leverages the analogy between query and support conditions to construct task-specific control features, instantiated by a matching mechanism and an update on a small set of task-specific parameters. Experiments on six novel spatial control tasks show that UFC, fine-tuned with only 30 annotated examples of novel tasks, achieves fine-grained control consistent with the spatial conditions. Notably, when fine-tuned with 0.1% of the full training data, UFC achieves competitive performance with the fully supervised baselines in various control tasks. We also show that UFC is applicable agnostically to various diffusion backbones and demonstrate its effectiveness on both UNet and DiT architectures.
18
+
19
+ ## Model Checkpoints
20
 
21
+ This project provides various model checkpoints for UFC with both UNet and DiT backbones, fine-tuned for different few-shot tasks.
22
 
23
+ ### Checkpoints using **UNet** backbone
24
 
25
  | Few-shot Task | Few-shot (30-shot) Fine-tuned Model | Base Meta-trained Model | Description |
26
  |:-------------:|:-----------------------------------:|:-----------------------:|:-----------:|
27
+ | `Canny` | [UNet_canny](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr23_canny_30) | [UNet_taskgr23](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr23) | The base model is trained with 4 tasks: `[Depth, Normal, Pose, Densepose]`|
28
+ | `Hed` | [UNet_hed](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr23_hed_30) | [UNet_taskgr23](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr23) | The base model is trained with 4 tasks: `[Depth, Normal, Pose, Densepose]`|
29
+ | `Depth` | [UNet_depth](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr13_depth_30) | [UNet_taskgr13](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr13) | The base model is trained with 4 tasks: `[Canny, HED, Pose, Densepose]`|
30
+ | `Normal` | [UNet_normal](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr13_normal_30) | [UNet_taskgr13](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr13) | The base model is trained with 4 tasks: `[Canny, HED, Pose, Densepose]`|
31
+ | `Pose` | [UNet_pose](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr12_pose_30) | [UNet_taskgr12](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr12) | The base model is trained with 4 tasks: `[Canny, HED, Depth, Normal]`|
32
  |`Densepose`| [UNet_densepose](https://huggingface.co/ntk1507/UFC/tree/main/unet_tuning_logs/UNet_taskgr12_densepose_30) | [UNet_taskgr12](https://huggingface.co/ntk1507/UFC/tree/main/unet_logs/UNet_taskgr12) | The base model is trained with 4 tasks: `[Canny, HED, Depth, Normal]`|
33
 
34
+ ### Checkpoints using **DiT** backbone
35
 
36
  | Few-shot Task | Few-shot (30-shot) Fine-tuned Model | Base Meta-trained Model | Description |
37
  |:-------------:|:-----------------------------------:|:-----------------------:|:-----------:|
38
+ | `Canny` | [DiT_canny](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr23_canny) | [DiT_taskgr23](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr23) | The base model is trained with 4 tasks: `[Depth, Normal, Pose, Densepose]`|
39
+ | `Hed` | [DiT_hed](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr23_hed) | [DiT_taskgr23](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr23) | The base model is trained with 4 tasks: `[Depth, Normal, Pose, Densepose]`|
40
+ | `Depth` | [DiT_depth](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr13_depth) | [DiT_taskgr13](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr13) | The base model is trained with 4 tasks: `[Canny, HED, Pose, Densepose]`|
41
+ | `Normal` | [DiT_normal](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr13_normal) | [DiT_taskgr13](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr13) | The base model is trained with 4 tasks: `[Canny, HED, Pose, Densepose]`|
42
+ | `Pose` | [DiT_pose](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr12_pose) | [DiT_taskgr12](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr12) | The base model is trained with 4 tasks: `[Canny, HED, Depth, Normal]`|
43
+ |`Densepose`| [DiT_densepose](https://huggingface.co/ntk1507/UFC/tree/main/DiT_tuning_logs/DiT_taskgr12_densepose) | [DiT_taskgr12](https://huggingface.co/ntk1507/UFC/tree/main/DiT_logs/DiT_taskgr12) | The base model is trained with 4 tasks: `[Canny, HED, Depth, Normal]`|