Every bearish case in this cluster shares one hidden assumption: that a token stays expensive. It has not. The cost to reach a fixed capability level has fallen roughly 10x per year — by Stanford's measure, 280x for GPT-3.5-class quality in under two years, from $20 to $0.07 per million tokens.[1][2] Mixture-of-experts models activate ~5% of their parameters per token; distilled 32B models now beat last year's frontier reasoning models; compute-to-fixed-capability halves about every 8 months.[3] Satya Nadella named the mechanism: Jevons paradox — cheaper AI means more of it, not less.[4] If that curve holds, a demand forecast that looks insane at today's cost is rational at next year's, and the overbuild thesis is the fragile number on the page. This case argues the escape hatch in full — and then, honestly, names where it does not hold.
This case exists to argue against the rest of its own cluster. The reckoning thesis — that the ~$725B AI buildout will not earn its return — rests on an assumption almost nobody states out loud: that the cost of running the models stays roughly where it is. It has not stayed. It has collapsed.
The numbers are not marketing. a16z documented that the cost to run a model of equivalent performance has fallen about 10x per year — roughly 1,000x over three years, from $60 to $0.06 per million tokens for GPT-3-class capability.[1] Stanford's AI Index put it more conservatively and more precisely: the cost to query a model at GPT-3.5 quality fell 280x in under two years, from $20.00 to $0.07 per million tokens.[2] Epoch AI, studying it independently, found compute-to-reach-a-fixed-capability halves roughly every 8 months — far faster than Moore's Law.[3]
The mechanism is real engineering, not hope. Mixture-of-experts architectures activate only ~5% of a model's parameters per token (DeepSeek: 671B total, 37B active). Speculative decoding delivers 2–3x throughput in production. Distillation now yields 32B models that beat last year's frontier reasoning models on hard benchmarks.[3] Satya Nadella gave the demand-side its name the day DeepSeek crashed Nvidia: Jevons paradox strikes again — as AI gets more efficient, its use skyrockets.[4] If a token keeps getting cheaper and demand keeps expanding to fill it, the capex is not an overbuild; it is a floor.
Now the honest weakness — the seam a smart critic attacks, named before they do. Two things. First, what fell was the cost to reach a fixed capability; the frontier price floor barely moved — GPT-3's 2021 launch price and a late-2024 frontier model both sat near $60 per million output tokens.[1] Second, Jevons only rescues provider revenue if demand is price-elastic above 1 — and the economists asked (Northeastern's Hanser, Venkatesan) explicitly decline to assert it. Sequoia's David Cahn, no bear, put the risk in his own words: GPU compute is turning into a commodity, competed down to marginal cost. Cheaper tokens can grow the buyer's bill while compressing the seller's margin toward zero. Efficiency is the escape hatch — but it can deflate revenue as surely as it expands demand. It cuts both ways, and this case says so.
$20.00 to $0.07 per million tokens for the same capability, in under two years.[2] The bear case assumes this stops. Its most honest counter is that the frontier price floor did not fall at all — the collapse is in reaching yesterday's bar, not today's.[1]
The efficiency curve that the bear case has to assume away — and the shock that proved it moves markets.
Stanford's AI Index measures the cost to query a GPT-3.5-class model falling from $20.00 to $0.07 per million tokens — a 280x reduction — as capability diffuses down to smaller, cheaper models.[2]
The Curvea16z documents that inference cost for equivalent performance falls about 10x per year — roughly 1,000x over three years for GPT-3-class capability. The efficiency curve gets a name and a slope.[1]
DeepSeek R1 matches frontier reasoning at a fraction of the cost. Nvidia falls ~17% in a day — the largest one-day market-cap loss in history — and Nadella posts: Jevons paradox strikes again. Efficiency is now a market force, not a footnote.[4]
The ShockEpoch AI finds compute needed to reach a fixed performance level halves roughly every 8 months — faster than Moore's Law. But its own authors note 60–95% of gains came from scaling, only 5–40% from algorithms.[3]
The honest asterisk holds through 2026: the frontier price floor stayed near $60 per million output tokens, and named economists decline to assert the demand elasticity the bull case needs. The escape hatch is real — and it cuts both ways.[1]
The CaveatJevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of. — Satya Nadella, Microsoft CEO, January 27, 2025
| Dimension | Evidence |
|---|---|
| Quality (D5) Origin · 88 | The lever is capability-per-dollar collapsing: the same GPT-3.5-class quality that cost $20 per million tokens in 2022 cost $0.07 by late 2024 — 280x.[2] a16z's ~10x/year and Epoch's 8-month halving corroborate it independently.[1][3] D5 is the origin because the entire counterexample rests on one measured fact — that quality is getting radically cheaper to deliver — which, if it continues, resets every downstream economic assumption in the cluster.Capability Per Dollar |
| Operational (D6) L1 · 84 | The fall is produced by real, shipped engineering: mixture-of-experts activating ~5% of parameters (DeepSeek 671B total / 37B active), speculative decoding at 2–3x production throughput, distillation yielding 32B models that beat last year's frontier reasoning models, and quantization (FP8/4-bit) as the mainstream serving default.[3][5] D6 amplifies from D5 because efficiency is not a market mood — it is a stack of techniques that keep compounding, which is why the curve has held across multiple independent measurements.The Engineering |
| Revenue (D2) L1 · 80 | The revenue dimension is where the escape hatch becomes two-sided. Jevons (Nadella) says cheaper compute expands total use and thus the market.[4] But Jevons only grows *provider* revenue if demand elasticity exceeds 1 — which named economists decline to assert — and Cahn warns compute is commoditizing to marginal cost.[6] D2 carries both the bull mechanism and its honest refutation: falling cost can grow the buyer's bill while compressing the seller's margin. The single most contested dimension in the case.The Double Edge |
| Customer (D1) L2 · 78 | As inference approaches free, access broadens and use expands — the demand-pull half of Jevons. Enterprise GenAI spend rose from $1.7B to $37B across 2023–2025 even as per-token price fell more than 90%.[4] D1 shows the mechanism working on the buyer side: usage genuinely skyrockets. The unresolved question this dimension inherits from D2 is whether that expanding usage lands as provider revenue or as commodity throughput. |
| Employee (D3) L2 · 66 | Cheaper inference expands the set of workloads and developers that can afford to run AI — the agentic and reasoning workflows that multiply token consumption exist because the token got cheap enough to burn.[3] D3 is where efficiency turns into new demand rather than saved cost: the same fall that could deflate a provider's price is what lets a developer run 50–500x more tokens per task, which is precisely the behavior the bull case needs and the bear case fears. |
| Regulatory (D4) 60 | D4 is the longest-lag dimension: whether a commoditized inference market sustains the capital that built it. Open-source token share rose from 34% to 65% in the first half of 2026, and Chinese models undercut frontier pricing by an order of magnitude.[7] If inference becomes a true commodity with no pricing power, the market structure that funds the buildout is the thing at risk — the slowest-moving and most decisive question the escape hatch raises about itself.Watch — Market Structure |
The cascade originates in D5 — Quality — because the lever is capability-per-dollar: the same output delivered at a collapsing cost.[1][2] From D5 it amplifies into D6 (the operational engineering — mixture-of-experts, speculative decoding, distillation, quantization that make the fall real) and D2 (the demand economics — Jevons, cheaper compute pulling more use) together, then D1 (broader access as inference approaches free) and D3 (the developers and workloads that expand to fill it). D4 (the regulatory/structural question of whether commoditized inference sustains a market) is the longest-lag dimension. This case is deliberately the counter-cascade to the cluster: [UC-251] documents the market pricing an overbuild — UC-254 is the case that, if the efficiency curve holds, breaks that read. [UC-044] is the sibling efficiency case it amplifies; [UC-220] is the buildout whose demand curve efficiency would rationalize. The honest hedge is stated in the analysis, not hidden: the frontier floor did not fall, and falling cost can deflate provider revenue.
-- UC-254: The Efficiency Escape Hatch: 6D Amplifying Cascade (COUNTEREXAMPLE)
-- The fragile number in the bear case (counters UC-251/255; amplifies UC-044)
FORAGE efficiency_escape_hatch
WHERE cost_per_token_collapsing = true
AND demand_expands_to_fill = true
AND bear_case_assumes_high_cost = true
ACROSS D5, D6, D2, D1, D3, D4
DEPTH 3
SURFACE efficiency_escape_hatch
DIVE INTO capability_per_dollar
WHEN fixed_capability_cost_falls_10x_yr = true
AND frontier_floor_holds = true
TRACE efficiency_counter_cascade
EMIT efficiency_escape_hatch_signal
DRIFT efficiency_escape_hatch
METHODOLOGY 84
PERFORMANCE 42
FETCH efficiency_escape_hatch
THRESHOLD 1000
ON EXECUTE CHIRP high 'Cost to run a fixed-capability model fell ~10x a year - 280x for GPT-3.5-class in under two years - so a demand curve that looks insane today is rational tomorrow and the overbuild thesis is the fragile number; but the frontier floor did not fall and cheaper tokens can deflate provider revenue - the escape hatch cuts both ways'
SURFACE analysis AS json
Runtime: @stratiqx/cal-runtime · Spec: cal.semanticintent.dev · DOI: 10.5281/zenodo.18905193
That the unit cost of the thing stays where it is. For AI, the unit is a token, and it has fallen ~10x a year.[1] Before you call the buildout an overbuild, price the token it runs on falling by an order of magnitude annually — then re-run the demand curve.
Mixture-of-experts (5% of parameters active), speculative decoding (2–3x), distillation (32B beats last year's frontier). These are shipped, measured techniques.[3] The cost fall is not a forecast; it already happened, twice over, and Epoch confirmed it independently.
What collapsed was the cost to reach a fixed capability. The frontier price floor stayed near $60/M output from 2021 to late 2024.[1] The escape hatch works for commodity capability, not for the newest frontier — a distinction the loudest bull cases skip and this one does not.
Jevons grows the buyer's bill; it does not automatically grow the seller's margin. If inference commoditizes to marginal cost — Cahn's own warning — cheaper tokens can deflate provider revenue faster than they expand demand.[6][7] The escape hatch is real and double-edged; a counterexample that hid the second edge would not be worth citing.
Seven sources: Stanford's AI Index and a16z for the primary cost-decline series, Epoch AI for the independent efficiency rate, the architectural levers (mixture-of-experts, distillation, speculative decoding), Nadella's Jevons framing — and, for the honest counter, the frontier-floor caveat and the economists who decline to assert the elasticity the bull case needs.
Every overbuild thesis has one assumption it never states. Find it. Here, it is that a token stays expensive.