Megawatts to Outcomes

A little over a year ago, the week DeepSeek made frontier reasoning cheap, one of the big AI infrastructure companies watched its stock fall about forty percent in a few days. The market's logic was simple. If intelligence just got this cheap, who needs all the expensive hardware under it. That same week, the same company did more sales than in any week of its life.

Both things happened. Nobody was lying. The market and the people buying compute looked at one event and drew opposite conclusions, and that gap is basically the whole bubble argument.

I got into this from the Nebius interview. Roman Chernin, one of the co-founders, talking to a VC for ninety minutes. He runs an infrastructure company fighting hyperscalers who outspend him maybe eight to one, so of course he thinks the spending makes sense. That is his book, and I watched it that way. What stuck with me was not the optimism. It was how he cuts up the industry.

For context, I am on the demand side of this. I do not build data centers. I build things that run on other people's GPUs and dread the inference bill at the end of the month. So treat this less as a verdict on his company and more as me sorting the parts I buy from the parts I do not.

Working Definition

Is it actually a bubble?

A bubble is not a lot of spending. It is a lot of spending against demand that never shows up. So the question is not how many billions go in. It is where we are on the curve. Chernin's answer is that almost every company on earth, even the sophisticated ones, is using AI in maybe the first percent of the cases it eventually will. Coding, the one thing that genuinely works, started working a few months ago.

If that is even close to right, this is not a top. It is the first inch, and most of the money is a bet that the rest of the inches show up.

Two curves over time: the price of a unit of intelligence falls while total demand for it climbs. The week intelligence got cheap, the market dropped about forty percent while customers bought more than ever. — The week intelligence got cheap, the price of a unit fell while the appetite for it rose. **Markets price the first line. Builders live on the second.**

Why cheaper meant more

The bubble instinct gets one old thing wrong. In the 1860s the economist William Jevons noticed that more efficient steam engines made Britain burn more coal, not less. Cheaper coal was suddenly worth using for things nobody had bothered with. Make a resource cheaper and you do not save it. You find new reasons to spend it.

Intelligence works the same way. A drop in the price of a token does not get banked. It gets spent on a harder problem that was too expensive last quarter. That is why the cheap-model week was the best sales week, not the worst. One set of teams found that running things in production finally added up. Another set reached for the thing they had been putting off.

This is the part I most want to be true, so I trust it least. It is conveniently shaped like a law that keeps the shovel-sellers in customers forever. But I keep looking for the counterexample and missing it. Every time inference got much cheaper, the bill went up, not down.

Mental Model

Demand = Solvable Tasks / Cost per Task

The mistake is treating these as independent. They are not. Drop the cost per task and the number of tasks worth doing rises faster than the price falls, because a pile of problems that were just barely not worth it all cross the line at once.

And cheaper is not magic. It is engineering. You take a baseline model and squeeze it: distill it into something smaller, let a draft model guess the next tokens, cache the parts that repeat, drop the weights to 8-bit and then 4-bit, batch the requests together. Do it well and the same answer costs a fraction of what it did. People obsess over the price of the chip. The bigger lever is what you do to the model running on it.

A baseline model is squeezed through distillation, speculative decoding, caching, quantization and batching to produce the same answer at a fraction of the cost. A cost bar shows up to roughly seventy percent cheaper. — Cheaper tokens are not sorcery. You take a vanilla model and squeeze it five ways. **Same answer, sometimes seventy percent less to serve.**

The ladder

The thing I actually walked away with had nothing to do with bubbles. AI infrastructure is not one business. It is a stack of them, and the tell for which one you are in is the unit on your invoice.

At the bottom is raw capacity. Buildings, racks, power. The unit is the megawatt. Read the giant deals and that is the literal word: someone signed for so many megawatts. The customers here are a tiny club, a few dozen hyperscalers and labs who show up with their whole software stack and want clean power and silicon, nothing else.

One rung up you sell a managed cloud instead of raw power. Provisioned clusters, storage, networking, the plumbing. The unit becomes the GPU-hour. Hundreds of teams would rather rent that than wire a building.

Up again is managed inference. The unit is the token. The customer does not want to know which chip they are on or tune a serving engine. They send text, get text, pay for the result. Thousands of product companies live here, building on open and specialized models.

At the top, barely formed, is the agentic layer. The unit is the outcome. You hand it a task and something underneath decides which model to call, how often, with how much context. Maybe it runs a cheap model twice and lets a third one judge. Tens of thousands of developers will build here without thinking about the machinery below.

Four stacked layers widening upward: bare metal in megawatts serving dozens, managed cloud in GPU-hours serving hundreds, managed inference in tokens serving thousands, agentic end-to-end in outcomes serving tens of thousands. — Megawatts, GPU-hours, tokens, outcomes. Each rung hides the one under it. **Climb, and you add more value and reach more customers at the same time.**

That last point is the whole thing, and it cuts against instinct. People assume the money is in owning the metal and everything above is thin resale margin. It is the other way around. Higher up you are not selling electricity. You are selling the disappearance of complexity, and people pay for that more reliably than almost anything. The capital pools at the bottom, where a few dozen buyers live. The customers pile up at the top, in the tens of thousands.

This is what gets missed when people call compute a commodity and stop. Down at the metal it nearly is, though nothing is a commodity at hyperscale, where the few buyers are brutal. Two rungs up the product is not the chip anymore. It is everything you no longer have to think about.

The unit on your invoice tells you which business you are really in. Megawatts and tokens are different companies, even on the same GPU.

Why cheap open models do not kill the labs

The obvious fear is that open models eat the stack from below. Download weights that benchmark near the frontier, run them for a tenth of the price, and why pay the labs at all.

Chernin's answer is that the frontier is not a place. It is an edge that keeps moving. You start on the best closed models because on day one they are the best thing going. Then you find your use case, watch the data, and usually realize you do not need the smartest model in the world. You need a smaller one bent around your exact problem. That one is cheaper and sometimes better, so you drop to an open model and tune it.

The thing that matters about those models is not that they are open. It is that they are trainable. Open means you can read the weights. Trainable means you can make them yours, and the value is almost all in that. The post-training. The distillation. The specialization for a corner the big model never aimed at.

So why does that not bleed the labs dry? Because every time we make something cheap, we do not stop. We spend the savings on a harder thing. The labs keep climbing toward the unsolved problems, and the supply of those is close to bottomless. The market does not collapse to one model. It splits into three. The smartest one, the fastest one, and the one that is good enough and cheap. Soon you will not pick between them by hand. Something upstream will.

An engine on top decides which model to call for a task, fanning out to three tiers: smartest but slow and expensive, fast, and good enough but cheap. — Not one model. Three, and a router that picks. Hard reasoning goes to the expensive one, a bulk rewrite goes to the cheap one. **The choosing is becoming the product.**

The cold start nobody photographs

The most useful thing in the whole interview had nothing to do with infrastructure. It was about why some companies look stuck right until they look unstoppable.

His example was a fintech that started with almost all of its inference on closed models. Some use cases worked. Some did not pay for themselves. They wanted to move to cheaper open models they could tune, and it went slowly, because first they had to build something with zero demo value. An evaluation harness. A way to tell, when you swap one model for another, whether you just made the product better or quietly broke it. Metrics. Gates. CI/CD, except for intelligence instead of code.

From outside, that quarter looks like nothing. No features. No growth. A team spinning its wheels. Then the harness is done, and the same team starts swapping models every week without fear, because they can finally see what better means for them. The line goes vertical.

A loop of four nodes: run inference, generate data, evaluate and gate, tune and specialize the model, and back to inference. — Run inference, keep the data it throws off, evaluate it honestly, specialize the model, run it again. The boring part is building the loop. **The teams that pay for it early are the ones that later look like overnight successes.**

This is the cleanest line I have found between teams playing with AI and teams compounding on it. The compounders paid the boring tax first and built the loop. Inference makes data. Data feeds the evals. The evals let you specialize without fear. The better model makes better inference. Then they let it turn. Shipping fast is not a personality trait. It is proof you already built the thing that lets you decide quickly.

Just do your job

There is an attitude under all of this that stuck with me, mostly because it is so unglamorous in a moment that loves to narrate itself.

Infrastructure at this scale is a capital fight against the best-funded companies alive, and capital has a cruel property. It cannot help you fast. In the next six months more money changes nothing. You have what you have, and you ship with it. Over a year it buys a little speed. Only past two years does it really unlock anything, because you are not building one data center. You are building a pipeline of them in stages. Secure power and land. Put up the shell. Fill it with GPUs. Every stage is a bet placed long before it pays.

A data center built in phases over time: secure power and land, build the shell, fill with GPUs. More capital buys almost nothing at six months, a little at twelve, a real unlock at twenty-four. — Money cannot compress the next six months. You ship with what you already built. **It only buys real speed two years out, across a pipeline of half-finished buildings.**

So the business is just delivering on promises you already made. Cloud, he says, is a post-sales business. You sell a promise, then you have to become someone worth having believed. Every signed deal, every investor, every cheaper model that grows your market is the same thing in different clothes. A line of credit. A chance to deliver, and nothing more until you do.

He kept landing on three words I have not shaken. Just do your job. It sounds like a fridge magnet until you notice how much of what passes for strategy is an elaborate way to avoid exactly that. He used the shark line too, the one about being alive only while you move. I would normally roll my eyes. But clichés survive because they keep being true when you are tired.

The thing that actually scares them

When I tried to work out what could actually break a business built this carefully, the answer was not a competitor. It was a shape the world might take.

Every rung of the ladder, every tier of models, every thousand-customer business above the metal needs a plural world. Lots of builders. Lots of models. Lots of specialized needs. Demand alive on every rung. If instead the future consolidates into three or five empires that own the models and the products and the distribution, the ladder collapses to one rung. Raw power, sold to the empires, to use as they like. The thing that flattens it is not competition. It is consolidation.

Left: a consolidated world of three to five empires where demand funnels through them and infrastructure is one rung. Right: a plural world of many builders where demand lives on every layer. — On the left, everything routes through a few empires and infrastructure is needed for one thing. On the right, a crowd of builders keeps demand alive up and down the stack. **The whole business is a bet on the picture on the right.**

This is a rare case where the business interest and the human one line up, which makes me a little suspicious of how nice it sounds. But I think it holds. A world where more people can turn an idea into something that runs is a world with more demand in it, and a better one to live in.

It also fixes the sovereign-AI conversation, which is stuck on the wrong word. Everyone argues about megawatts and power, as if the bottleneck to a country having its own AI were electricity. The power comes if the demand is there, and the demand comes from builders, not substations. A place that wants an AI future should worry less about gigawatts and more about how many real model and product companies it is growing. A Mistral. A Black Forest Labs. A Lovable. Those are what make the power worth pouring.

Where it goes wrong

Once the ladder is in your head, the failure modes name themselves.

Pricing the metal, ignoring the system. People fixate on the sticker price of a GPU, three dollars an hour or four or five, when the number that matters is total cost of ownership. The same chip throws off wildly different economics depending on how long it runs without falling over and how many tokens you get out of it. Good optimization moves the real price by an order of magnitude. The sticker is close to the least interesting number on the page.

Mistaking cheaper for smaller. Reading every efficiency gain as less demand, when the record says the opposite almost every time.

Building megawatts without builders. The sovereign-AI mistake. Pour money into power and capacity while starving the layer that creates the demand for it.

Skipping the cold start. Chase visible features before building the dull evaluation loop, then wonder why every model change feels like a coin flip.

Betting on a consolidated world. Build a company, or a career, that only pays off if three players win everything, and call it realism.

Failure Modes

Real Cost = Sticker Price / Utilization, then × Optimization

Compounding = Cold-Start Work × Time

Fragility = Demand stacked on too few rungs

The first line is why platform quality beats sticker price. The second is why a patient team passes a fast-looking one. The third is the one that should keep an infrastructure founder, or anyone building on top of them, up at night.

What I am taking from it

A few things I am keeping, in no order.

Know the unit you are billed in. It quietly decides which business you are running and which layer's economics make or break you. Confusing megawatts with tokens is how you misprice your own work.

Spend what you save on something harder. When a thing gets cheaper, do not do the same thing for less. Go after the problem that was out of reach last quarter.

Build the eval loop before you scale. The dull foundation is the part that later gets called an overnight success. Pay that tax early.

Treat the win as the bill. Funding, a signed customer, a flattering benchmark. None of it is the prize. Each one is a promise you now owe.

· · ·

Alive when you move

I keep coming back to that week, a stock falling while the phones rang, because it is a small lesson in how two careful people read the same fact into opposite futures. Cheaper intelligence is a shrinking market or a pile of newly affordable problems, depending on where you stand. The metal looks like where the money is, until you climb and find it sitting at the top, inside the dull work of making complexity vanish for someone else. The run of empires looks inevitable until you remember how stubbornly people want to build their own thing.

The shark line is a cliché, but there is an older version I trust more. In the tradition I read in, reality is not a fixed place you reach. It is closer to a pulse, a constant becoming, and you are most alive where you are still moving. The infrastructure people got there the hard way. Capital cannot save you in six months. The credit is only ever a debt. You wake up, it is a new day, nothing is owed to you, and you go do the work.

So the bubble question is the wrong one. The better one is quieter, and it is the same on both sides of the table.

Are you building for a world that consolidates, or one that stays plural? The megawatts and the tokens and the price of a unit of intelligence are mostly detail under that one bet.

For what it is worth, I am betting on the plural one. Not because I am sure it wins. I am not. Because it is the only version with room left in it for the rest of us to keep building.

If this resonated, the adjacent essays are Where AI Hardware and Ambient AI Actually Stand in 2026 and The AI and ML Stack in 2026, and the Agent Road to 2027.