AI RESEARCH UNIT
Posts
Can We Train AI Without Backprop? Meet NoProp and Diffusion Magic

Can We Train AI Without Backprop? Meet NoProp and Diffusion Magic

An easy explanation for the AI curious

May 11, 2025

If you've ever trained a neural network (or heard someone grumble about it), you've probably come across backpropagation. It's the brain of most modern AI learning: a way for AI models to learn from their mistakes by passing error signals backward through layers. But here's the thing—it’s resource-hungry, tricky to parallelize, and not how our brains seem to work.

So, what if we could teach AIs without backpropagation? Enter NoProp Diffusion, a radical new idea that’s shaking things up.

What is Diffusion?

To train a diffusion model you start with real data (e.g., images), and then add noise gradually to the data over many steps. At each step, the model is trained to predict the noise that was added, given the noisy image and the current step. From this the model learns to reverse the process, and remove noise step-by-step.

In this process, each layer tries to predict what the next layer is going to need to work with in order to produce the best overall output.

What is noprop?

Noprop is a model training method that removes the forward/backpropagation stage. Rather than having each layer create outputs that try to predict the next layer’s inputs, it just does its best to clean up what it has and passes it on, rather than curating it specially for the next layer.

In this case each layer is like a single expert in a mixture of experts model. It’s a specialist in dealing with its own stage of the denoising process, and doesn’t worry about what any other part of the model is doing.

Each layer of the AI doesn’t wait for signals from above or below.
Instead, it gets a noisy version of the right answer (like a blurry image or jumbled sentence).
Its job? Clean up the mess a bit.
Then the next layer does the same, a little better.
By the end, the last layer has a pretty polished version—all without the model ever having to "look backward."

It’s like a classroom where each student refines the same homework independently and passes it on.

Why Combine NoProp + Diffusion?

Because both work by cleaning up noise, they’re a natural pair. Diffusion models say: "Let’s generate by removing noise, one step at a time." NoProp says: "Each layer just denoises, no need for backprop!"

Put them together, and you could get a new kind of AI that learns and writes without traditional training methods—using less memory, possibly training faster, and maybe even mimicking how human brains learn.

This isn’t necessarily the most efficient way to train (it’s pretty bad, honestly) but it has the huge advantage of producing less complex data encodings than regular backpropagation models. This means that it’s possible to retrain one part of the model without breaking others - something that is not possible with backpropagation models. This opens the way for continual improvement, and thus for artificial superintelligence.

Can It Generate Text Too?

Not yet at scale—but it’s close.

The challenge? Text is complex and structured, and every word can depend on many others. But researchers are exploring ways to:

Represent sentences as fuzzy vectors.
Train each model layer to clean up a messy version of that sentence.
Eventually, decode the clean result back into human-readable text.

Think of it like: garbled sentence → slightly cleaner → almost there → perfect reply!

What’s the Catch?

It's still early days. Making sure each layer improves things (and doesn’t mess it up) is hard. Training without knowing the big picture is risky. And decoding structured language from fuzzy math is tricky.

But the payoff?
Training that’s faster, cheaper, and maybe even more human-like.

Why It Matters

If this works, it could lead to:

AI models that train on more devices, faster.
Systems that don’t rely on traditional error correction, mirroring biological brains.
New ways to generate, reason, or refine language that aren’t stuck in the token-by-token world.
Incremental retraining: this is the big one. If we can repeatedly retrain models on new data without them forgetting old information, we’re on our way to ASI.

It’s not replacing ChatGPT tomorrow—but it’s a glimpse into what next-gen AI could look like.

Let us know if you'd like a deeper dive into AI research or want to follow this story as it unfolds.
Build the Future of AI With Us!

Join our community of innovators shaping the next era of open-source intelligence.

This isn’t just open-source — it’s open potential.

https://x.com/airesearchunit | https://x.com/Superior_Agents | https://superioragents.github.io/superioragents-docs/research