Skip to content

merdan-9/diffusion-1d

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒŠ 1D Diffusion: Teaching AI to Un-Screw Up Noise

TL;DR: Built a complete diffusion model from scratch. It turns random garbage into beautiful sine waves. Magic? Nah, just math and PyTorch.

Hero

Status: โœ… Actually works | ๐Ÿš€ Trains in 2 mins | ๐Ÿง  Teaches you diffusion models


๐Ÿค” Wait, What Even Is This?

You know how you can "enhance" blurry images in movies? (Total BS, btw)

Well, diffusion models actually do something cooler: create things from pure noise.

This project teaches you how by starting simple - turning random static into smooth sine waves.

The Magic Trick

Random Noise ๐Ÿ˜ต โ†’ [Model does 100 magic steps] โ†’ Perfect Sine Wave โœจ

Forward Process

Watch how we destroy a perfectly good sine wave by adding noise (training), then reverse it to generate new ones!


๐ŸŽฏ Why Should I Care?

This is literally how Stable Diffusion, Midjourney, and DALL-E work. Just with images instead of 1D signals.

Learn it here first where it's:

  • โšก Fast (2 min training on CPU)
  • ๐Ÿ‘€ Easy to visualize (simple plots, not scary tensors)
  • ๐Ÿง  Actually understandable (no PhD required)

Then go build the next DALL-E 4. I believe in you! ๐Ÿš€


๐Ÿ—๏ธ How It Works (The Actual Magic)

Step 1: Destroy Everything ๐Ÿ’ฅ

Take clean sine waves โ†’ add noise gradually โ†’ get pure chaos

clean_wave + gaussian_noise = total_garbage

Step 2: Train a Psychic Neural Network ๐Ÿ”ฎ

Teach it to predict: "What noise was added?"

if model.can_predict_noise():
    model.can_remove_noise()  # big brain time

Step 3: Generate Like a Boss ๐Ÿ˜Ž

Start with pure noise โ†’ ask model to remove noise 100 times โ†’ profit!

x = random_noise()
for _ in range(100):
    x = model.denoise(x)  # slowly becoming beautiful
return x  # chef's kiss ๐Ÿ‘Œ

๐Ÿš€ Quick Start (Just Do Itโ„ข)

# Install stuff
pip install torch numpy matplotlib pyyaml

# Run everything
python main.py

# That's it. Seriously.

What happens:

  1. โฐ Trains for 2 minutes
  2. ๐Ÿ’พ Saves model to outputs/
  3. ๐ŸŽจ Generates 16 new sine waves
  4. ๐Ÿ“Š Shows you a pretty plot

Check outputs/sampled_sequences.png to see your AI's artwork!


๐Ÿ“Š Actual Results (Receipts Included)

Training Loss

Loss goes down = Model learns = We're cooking! ๐Ÿ”ฅ

Initial: 0.66 โ†’ Final: 0.18 (that's a 73% improvement, if you're into stats)

Generated samples - MLP vs UNet:

MLP (Simple) UNet (Advanced)
MLP Results UNet Results

MLP: Simple 3-layer network (fast, ~50K params) UNet: Multi-scale architecture with skip connections (~500K params)

Both generate beautiful sine waves, but UNet captures finer details! ๐ŸŽจ


โš™๏ธ Customize It (Config Go Brrrr)

Edit config.py:

@dataclass
class DiffusionConfig:
    model_type: str = "mlp"      # "mlp" or "unet" - Switch models!
    num_epochs: int = 10         # More epochs = better (but slower)
    timesteps: int = 100         # More steps = smoother results
    batch_size: int = 64         # GPU go brrr? Increase this
    hidden_dim: int = 128        # Model capacity (bigger = more powerful)
    device: str = "cuda"         # Got GPU? Use it!

Model Comparison:

Feature MLP UNet
Parameters ~50K ~500K
Training Speed โšก Fast ๐Ÿข Slower (10x)
Sample Quality โœ… Good โœจ Excellent
Memory Usage ๐Ÿ’š Low ๐ŸŸก Higher
Architecture Simple feedforward Multi-scale + skip connections

Pro tips:

  • ๐ŸŽฏ Try UNet first! Set model_type = "unet" for best quality
  • ๐Ÿข CPU only? Set device = "cpu" and batch_size = 32
  • ๐ŸŽ๏ธ Want it faster? Use model_type = "mlp", timesteps = 50, num_epochs = 5
  • ๐ŸŽจ Want better quality? Use model_type = "unet", timesteps = 1000, num_epochs = 50
  • ๐Ÿ’พ Low memory? Set hidden_dim = 64 to reduce model size

๐Ÿ“‚ Project Files (What's What)

diffusion-1d/
โ”œโ”€โ”€ main.py           โ†’ Press play here ๐ŸŽฎ
โ”œโ”€โ”€ config.py         โ†’ Tweak knobs here ๐ŸŽ›๏ธ
โ”œโ”€โ”€ model.py          โ†’ MLP brain ๐Ÿง  (simple)
โ”œโ”€โ”€ model_unet.py     โ†’ UNet brain ๐Ÿง ๐Ÿ”ฅ (advanced)
โ”œโ”€โ”€ diffusion.py      โ†’ The magic โœจ
โ”œโ”€โ”€ train.py          โ†’ The learning ๐Ÿ“š
โ”œโ”€โ”€ data.py           โ†’ Sine wave factory ๐Ÿญ
โ”œโ”€โ”€ utils.py          โ†’ Helper stuff ๐Ÿ”ง
โ””โ”€โ”€ sampling.py       โ†’ Generation station ๐ŸŽจ

Files ranked by importance:

  1. main.py - Start here
  2. config.py - Switch between MLP/UNet here!
  3. diffusion.py - Where magic happens
  4. model.py & model_unet.py - Two different AI architectures
  5. Everything else - Supporting cast

๐Ÿง  The Secret Sauce (For Nerds)

๐Ÿ”ฅ Click if you want the actual math

Forward Diffusion (Breaking Stuff)

x_t = โˆš(ฮฑฬ…_t) ยท x_0 + โˆš(1-ฮฑฬ…_t) ยท ฮต

Translation: Mix clean data with noise based on timestep t

Reverse Diffusion (Fixing Stuff)

x_{t-1} = (x_t - ฮฒ_t ยท ฮต_ฮธ(x_t,t) / โˆš(1-ฮฑฬ…_t)) / โˆš(ฮฑ_t) + ฯƒ_t ยท z

Translation: Predict noise, subtract it, repeat 100 times

Training (Teaching the AI)

loss = MSE(predicted_noise, actual_noise)

Translation: "Guess the noise. Wrong? Do better next time."

๐Ÿ’ก Wait, so how does this even work?

The Insight:

If you know what noise was added, you can subtract it!

The Process:

  1. Train model to predict noise at any corruption level
  2. Start from pure noise (t=100)
  3. Ask model: "What noise is here?"
  4. Remove predicted noise
  5. Repeat for t=99, 98, 97... down to 0
  6. Boom! Clean signal appears

Why it works:

The model learns the structure of sine waves by seeing them at every corruption level. It knows what "sine wave under noise" looks like, so it can gradually recover it!

๐Ÿ—๏ธ Why UNet Works Better (Architecture Deep Dive)

MLP Architecture (Simple)

Input [64] โ†’ Flatten โ†’ Dense โ†’ Dense โ†’ Output [64]

Problem: Treats every position independently, loses spatial structure.

UNet Architecture (Advanced)

Input [64]
    โ†“
[Encoder Path - Downsampling]
    64 โ†’ 32 โ†’ 16 โ†’ 8  (learn hierarchical features)
    โ†“
[Bottleneck]
    8 (deepest understanding)
    โ†“
[Decoder Path - Upsampling + Skip Connections]
    8 โ†’ 16 โ†’ 32 โ†’ 64  (reconstruct with fine details)
    โ†“
Output [64]

Key Innovation: Skip Connections

The encoder's high-resolution features jump directly to the decoder:

  • Encoder 64 โ†’ Skip โ†’ Decoder 64 (preserves fine details!)
  • Encoder 32 โ†’ Skip โ†’ Decoder 32 (preserves medium features)
  • Encoder 16 โ†’ Skip โ†’ Decoder 16 (preserves structure)

Why This Matters:

  1. Multi-scale processing - Understands both "big picture" and "fine details"
  2. Skip connections - Preserves information lost in downsampling
  3. Convolutions - Learns position-independent patterns (works on shifted signals)

This is why Stable Diffusion, DALL-E, and all top diffusion models use U-Net!


๐ŸŽ“ Learn More (Go Deeper)

Must-read:

Watch:

Build:

  • Extend to 2D (MNIST digits)
  • Try different data (audio, images)
  • Implement DDIM (faster sampling)

Built with ๐Ÿง  and way too much โ˜•

Found this helpful? โญ Star it!

Found a bug? ๐Ÿ› Open an issue

Want to contribute? ๐ŸŽ‰ PRs welcome!


"Any sufficiently advanced technology is indistinguishable from magic... until you read the code." - Arthur C. Clarke (probably)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages