I just finished a paper and wanted to write it up here in plain language, because the core idea is simple even though the numbers behind it aren’t.
The question
Everyone talks about the energy cost of training large models as if it’s mostly a software problem — better attention mechanisms, lower precision, smarter sparsity. Those help, but they’re all optimizations within the same hardware paradigm: a GPU, where memory and compute are physically separate chips that have to constantly shuttle data back and forth between them.
I wanted to know: how much of the energy cost is actually that architecture, and not the algorithms running on top of it?
The hypothesis
My hypothesis is that the human brain gives us a clean way to measure this, because it’s the only general-purpose intelligence we know of that solves comparable problems, and it does it on about 20 watts.
So I built an energy-equivalent comparison: take the total metabolic energy a human consumes from birth to age 25 (~17,500 kWh), and ask what a commodity GPU cluster could compute if it burned exactly that same amount of energy. Then compare outputs.
The result: a gap of about 50,000×. The brain delivers roughly 50,000 times more useful computation per unit of energy than the GPU cluster does, over the same energy budget.
What’s interesting is that this isn’t just one ratio — it’s two ratios, derived completely independently (one from raw compute output, one from energy efficiency per FLOP), and they converge on the same number. That convergence isn’t a coincidence, it’s mathematically guaranteed once you fix the energy budget — which means the 50,000× figure isn’t noise, it’s a real signature of something structural.
Where the gap comes from
I decomposed it into three independent, multiplicative sources, all rooted in architecture rather than algorithms:
- The memory wall — moving data between separate memory and compute chips burns up to 90% of a GPU’s energy. In the brain, a synapse is both the memory and the compute event. No bus, no fetch cycle. (~100–1,000× gain)
- Dense vs. event-driven activation — GPUs fire every core every cycle. Real neurons fire rarely (under 1% active at any moment) and cost almost nothing when idle. (~100–1,000× gain)
- Digital vs. analog compute — a digital multiply-accumulate costs orders of magnitude more energy than a biological synaptic event. (~10–100× gain)
Multiply the minimums together and you get a ceiling of ~100,000×, which comfortably brackets the observed 50,000× gap with no leftover unexplained.
The neuromorphic hardware that already exists backs this up directionally: IBM’s NorthPole, Intel’s Loihi 2, and recent spiking/co-located designs have shown gains from 400× up to 5,600× over the last decade, tracing a clear trajectory toward that ceiling.
Why this matters
If even a fraction of this gap closes, the implications are large: a GPT-5-class training run currently costing roughly $31M in electricity could drop to a few hundred dollars. Inference cost per query could fall enough that frontier models could run on a hearing-aid-sized power budget. The concentration of frontier AI in a handful of companies is, in part, a side effect of this architectural energy tax — not an inherent law of how intelligence has to be built.
None of this requires new physics. It requires building hardware that doesn’t separate memory from compute, doesn’t fire idle units, and doesn’t insist on digital precision everywhere. The brain has been proving this works for 500 million years on 20 watts.
Read the full paper
I go through all the math, the references, and the case against the “it’s just a software problem” counterargument in the full writeup:
Download: The 50,000× Gap (PDF)
Happy to hear pushback on the assumptions, especially around the brain compute estimate, which is the most contested number here.