Training conditional diffusion model from scratch, but samples stay noisy after 120k steps

Idan93 · November 21, 2025, 1:48pm

Hi everyone,

I’m training a conditional diffusion model from scratch on ~13k paired images, each 1×224×224.

My UNet2DModel takes 2 channels (noisy_opd + condition image) and outputs 1 channel, with block_out_channels (128,128,128,256,256) and attention on the last down/up block.

Training uses batch size 8 with gradient accumulation 2 (effective 16), AdamW with lr 2e-4, EMA 0.9995, CFG drop prob 0.1, cosine LR schedule 500 warmup steps, and a DDIMScheduler with 1000 training timesteps (prediction_type=“epsilon”).

So far I’ve trained for ~150 epochs, which is around 121,800 optimizer steps. For sampling I run DDIM with 50 inference steps, starting from noise and concatenating [noisy, condition] at each timestep.
The problem is that even after all this training the generated OPD images remain very noisy and never become clean.
Is DDIMScheduler appropriate for training from scratch, or should I use DDPM for training and DDIM only for inference? Could my setup (UNet size, scheduler choice, EMA, or number of inference steps) explain why the model still outputs so much noise?
Any advice would be very appreciated

John6666 · November 22, 2025, 3:42am

For now, I’ve gathered some resources.

Topic		Replies	Views
Unconditional image generation task Models	0	129	April 16, 2024
How much does the initial noise influence the output quality? 🧨 Diffusers	2	269	September 8, 2024
Unconditional image generation Beginners	0	120	April 15, 2024
Issue with DDPM Training on Stanford Cars: Noise-Only Samples with Small Batch Sizes 🧨 Diffusers	1	132	March 26, 2025
Denoising Diffusion Probabilistic Models (DDPM) - reconstruction is not sharp but blurry and noisy Models	1	841	April 4, 2024

Training conditional diffusion model from scratch, but samples stay noisy after 120k steps

Related topics