TL;DR: Built a complete diffusion model from scratch. It turns random garbage into beautiful sine waves. Magic? Nah, just math and PyTorch.
Status: โ Actually works | ๐ Trains in 2 mins | ๐ง Teaches you diffusion models
You know how you can "enhance" blurry images in movies? (Total BS, btw)
Well, diffusion models actually do something cooler: create things from pure noise.
This project teaches you how by starting simple - turning random static into smooth sine waves.
Random Noise ๐ต โ [Model does 100 magic steps] โ Perfect Sine Wave โจ
Watch how we destroy a perfectly good sine wave by adding noise (training), then reverse it to generate new ones!
This is literally how Stable Diffusion, Midjourney, and DALL-E work. Just with images instead of 1D signals.
Learn it here first where it's:
- โก Fast (2 min training on CPU)
- ๐ Easy to visualize (simple plots, not scary tensors)
- ๐ง Actually understandable (no PhD required)
Then go build the next DALL-E 4. I believe in you! ๐
Take clean sine waves โ add noise gradually โ get pure chaos
clean_wave + gaussian_noise = total_garbageTeach it to predict: "What noise was added?"
if model.can_predict_noise():
model.can_remove_noise() # big brain timeStart with pure noise โ ask model to remove noise 100 times โ profit!
x = random_noise()
for _ in range(100):
x = model.denoise(x) # slowly becoming beautiful
return x # chef's kiss ๐# Install stuff
pip install torch numpy matplotlib pyyaml
# Run everything
python main.py
# That's it. Seriously.What happens:
- โฐ Trains for 2 minutes
- ๐พ Saves model to
outputs/ - ๐จ Generates 16 new sine waves
- ๐ Shows you a pretty plot
Check outputs/sampled_sequences.png to see your AI's artwork!
Loss goes down = Model learns = We're cooking! ๐ฅ
Initial: 0.66 โ Final: 0.18 (that's a 73% improvement, if you're into stats)
Generated samples - MLP vs UNet:
| MLP (Simple) | UNet (Advanced) |
|---|---|
![]() |
![]() |
MLP: Simple 3-layer network (fast, ~50K params) UNet: Multi-scale architecture with skip connections (~500K params)
Both generate beautiful sine waves, but UNet captures finer details! ๐จ
Edit config.py:
@dataclass
class DiffusionConfig:
model_type: str = "mlp" # "mlp" or "unet" - Switch models!
num_epochs: int = 10 # More epochs = better (but slower)
timesteps: int = 100 # More steps = smoother results
batch_size: int = 64 # GPU go brrr? Increase this
hidden_dim: int = 128 # Model capacity (bigger = more powerful)
device: str = "cuda" # Got GPU? Use it!Model Comparison:
| Feature | MLP | UNet |
|---|---|---|
| Parameters | ~50K | ~500K |
| Training Speed | โก Fast | ๐ข Slower (10x) |
| Sample Quality | โ Good | โจ Excellent |
| Memory Usage | ๐ Low | ๐ก Higher |
| Architecture | Simple feedforward | Multi-scale + skip connections |
Pro tips:
- ๐ฏ Try UNet first! Set
model_type = "unet"for best quality - ๐ข CPU only? Set
device = "cpu"andbatch_size = 32 - ๐๏ธ Want it faster? Use
model_type = "mlp",timesteps = 50,num_epochs = 5 - ๐จ Want better quality? Use
model_type = "unet",timesteps = 1000,num_epochs = 50 - ๐พ Low memory? Set
hidden_dim = 64to reduce model size
diffusion-1d/
โโโ main.py โ Press play here ๐ฎ
โโโ config.py โ Tweak knobs here ๐๏ธ
โโโ model.py โ MLP brain ๐ง (simple)
โโโ model_unet.py โ UNet brain ๐ง ๐ฅ (advanced)
โโโ diffusion.py โ The magic โจ
โโโ train.py โ The learning ๐
โโโ data.py โ Sine wave factory ๐ญ
โโโ utils.py โ Helper stuff ๐ง
โโโ sampling.py โ Generation station ๐จ
Files ranked by importance:
main.py- Start hereconfig.py- Switch between MLP/UNet here!diffusion.py- Where magic happensmodel.py&model_unet.py- Two different AI architectures- Everything else - Supporting cast
๐ฅ Click if you want the actual math
x_t = โ(ฮฑฬ
_t) ยท x_0 + โ(1-ฮฑฬ
_t) ยท ฮตTranslation: Mix clean data with noise based on timestep t
x_{t-1} = (x_t - ฮฒ_t ยท ฮต_ฮธ(x_t,t) / โ(1-ฮฑฬ
_t)) / โ(ฮฑ_t) + ฯ_t ยท zTranslation: Predict noise, subtract it, repeat 100 times
loss = MSE(predicted_noise, actual_noise)Translation: "Guess the noise. Wrong? Do better next time."
๐ก Wait, so how does this even work?
The Insight:
If you know what noise was added, you can subtract it!
The Process:
- Train model to predict noise at any corruption level
- Start from pure noise (t=100)
- Ask model: "What noise is here?"
- Remove predicted noise
- Repeat for t=99, 98, 97... down to 0
- Boom! Clean signal appears
Why it works:
The model learns the structure of sine waves by seeing them at every corruption level. It knows what "sine wave under noise" looks like, so it can gradually recover it!
๐๏ธ Why UNet Works Better (Architecture Deep Dive)
Input [64] โ Flatten โ Dense โ Dense โ Output [64]
Problem: Treats every position independently, loses spatial structure.
Input [64]
โ
[Encoder Path - Downsampling]
64 โ 32 โ 16 โ 8 (learn hierarchical features)
โ
[Bottleneck]
8 (deepest understanding)
โ
[Decoder Path - Upsampling + Skip Connections]
8 โ 16 โ 32 โ 64 (reconstruct with fine details)
โ
Output [64]
Key Innovation: Skip Connections
The encoder's high-resolution features jump directly to the decoder:
- Encoder 64 โ Skip โ Decoder 64 (preserves fine details!)
- Encoder 32 โ Skip โ Decoder 32 (preserves medium features)
- Encoder 16 โ Skip โ Decoder 16 (preserves structure)
Why This Matters:
- Multi-scale processing - Understands both "big picture" and "fine details"
- Skip connections - Preserves information lost in downsampling
- Convolutions - Learns position-independent patterns (works on shifted signals)
This is why Stable Diffusion, DALL-E, and all top diffusion models use U-Net!
Must-read:
- Lilian Weng's Diffusion Post - Best explanation on the internet
- The DDPM Paper - Where it all started
Watch:
- What are Diffusion Models? - 15 min video explainer
- Diffusion Models Explained - (In Chinese ๆๅฎๆฏ )
Build:
- Extend to 2D (MNIST digits)
- Try different data (audio, images)
- Implement DDIM (faster sampling)
Found this helpful? โญ Star it!
Found a bug? ๐ Open an issue
Want to contribute? ๐ PRs welcome!
"Any sufficiently advanced technology is indistinguishable from magic... until you read the code." - Arthur C. Clarke (probably)




