AI Deployment Capacity Stack Report

Purpose

This report is a physical-capacity map, not a vibes document. The question is simple: if frontier AI stays centralized and demand stays effectively insatiable, what actually limits how fast intelligence can be deployed into the world?

I am treating training, reasoning, and inference as competing for the same scarce stack: accelerators, HBM, advanced packaging, optics, powered sites, substations, switchgear, transformers, and generation. The report is intentionally quantitative so you can build a mental picture rather than just inherit my conclusion.

Horizon 2026-2031 Bias centralization first Style public anchors plus explicit estimates Question what binds first?

Scope And Assumptions

Frontier intelligence remains centralized and tightly controlled for the next several years.
The report focuses on deployed capacity that is actually energized and populated, not just announced capex.
Most clean public data is US and Taiwan heavy, so global estimates are extrapolated from those anchors.
Where the public record stops, I make the estimate explicit rather than pretending the number is exact.

Continuous Load

1 GW = 8.76 TWh/yr

A single always-on 1 GW AI campus is not a rounding error. It is utility-scale electricity demand.

Frontier Cluster

32,768 GPUs ~= 67 MW

Using NVIDIA's 32,768-GPU cluster scale and DGX B200 system power, one real frontier cluster is already a small power plant.

100k GPU Complex

~206 MW facility

That is the right order of magnitude for a serious frontier deployment, not a toy lab.

1.2 GW Campus

~584k GPUs

At a 1.15 PUE baseline and DGX B200 system power, this is roughly the scale implied by Crusoe Abilene.

Bottom Line

My current view in one page

The first hard bind is not software, land, or generic server manufacturing. It is powered sites plus the electrical chain, then HBM and advanced packaging.
Once interconnect, equipment, and permits are in hand, shells can rise surprisingly fast. Abilene shows roughly how quickly giga-scale AI campuses can materialize after the hard prerequisites are solved.
Raw logic wafers matter, but they are not my base-case first bind. The tighter semiconductor bottlenecks are HBM and CoWoS-class packaging.
Networking and optics are large execution problems, but if the buyer has money and priority they usually sit behind power and memory in the queue of things that actually stop deployment.
The practical annual ceiling for net new frontier AI-dedicated energized capacity is probably single-digit to low-teens GW before 2030 unless governments actively force power, permitting, and equipment allocation to move faster.

The short version: if you want a serious deployment model, think from the fence line backward. Start with energized MW, then turn that into racks, GPUs, HBM stacks, and optics. Do not start with model demand and assume the rest will appear.

Stack Map

The layers that matter

Demand Layer

Frontier training, reasoning, and low-latency inference all want the same accelerators and power.

Compute Layer

Blackwell and MI300-class systems convert dollars into kW, HBM, and networking demand.

Semiconductor Layer

HBM and advanced packaging bind before raw logic wafers in my base case.

Campus Layer

Land, shells, cooling loops, and dense rack integration matter, but mainly after power is secured.

Power Layer

Substations, transformers, switchgear, interconnection, generation, and political permission decide the pace.

Conventions

Unit assumptions used throughout

Input	Value used	Why I used it
DGX B200 system power	~14.3 kW max for 8 GPUs	Cleanest public system-level power anchor from NVIDIA.
Blackwell-class GPU system power	~1.79 kW per GPU	14.3 / 8. This is better than using chip TDP alone because it includes full system overhead.
PUE baseline	1.15	Reasonable for new liquid-cooled AI facilities. Reality can be better or worse.
HBM per Blackwell-class GPU	~180-186 GB	DGX B200 and GB200 public memory specs converge to this range.
HBM stacks per accelerator	~8 stacks	Good working assumption for current high-end AI packages.
Frontier cluster reference	32,768 GPUs	NVIDIA uses this scale in Blackwell and DGX B200 cluster comparisons.

Important: these are modeling anchors. If a future generation materially changes GPU-to-kW or HBM-to-GPU ratios, the ceilings move.

Mental Picture

What different scales actually look like

Scale	GPUs	DGX B200 systems	GB200 NVL72 racks	Facility draw	Annual electricity
One frontier cluster	32,768	4,096	~456	~67 MW	~0.59 TWh
100k GPU complex	100,000	12,500	~1,390	~206 MW	~1.80 TWh
1.0 GW AI facility	~486,000	~60,800	~6,760	1.0 GW	8.76 TWh
1.2 GW AI campus	~584,000	~73,000	~8,110	1.2 GW	10.51 TWh

The NVL72 rack count is derived by scaling the DGX B200 per-GPU system power. It lands in the same rough 120-140 kW rack class generally discussed for Blackwell rack-scale deployments.

Layer 1

Accelerator node economics and power density

The cleanest public anchor here is NVIDIA DGX B200 because NVIDIA publishes both system power and memory. I use that as the conversion engine for everything else.

System	Published spec	Why it matters
DGX B200	8 Blackwell GPUs, 1,440 GB total HBM3e, 64 TB/s HBM bandwidth, ~14.3 kW max	This gives a system-level per-GPU power anchor of ~1.79 kW and ~180 GB HBM per GPU.
GB200 NVL72	72 Blackwell GPUs, 36 Grace CPUs, 13.4 TB HBM3e, 130 TB/s NVLink, liquid-cooled rack-scale design	This is the public picture of what high-density frontier deployment actually wants to look like.
NVIDIA cluster reference	32,768 GPU scale in B200 and Blackwell comparison footnotes	This is a useful reference for a serious frontier training cluster rather than a single box.

100,000 Blackwell-class GPUs -> ~179 MW IT load -> ~206 MW facility draw at PUE 1.15.

1,000,000 Blackwell-class GPUs -> ~1.79 GW IT load -> ~2.06 GW facility draw at PUE 1.15.

One 32,768-GPU frontier cluster -> ~58.6 MW IT load -> ~67.4 MW facility draw.

Why I trust this scaling more than chip TDP headlines

Chip TDP by itself understates reality because real deployments pay for CPUs, DPUs, fans or pumps, memory, board power delivery, and rack-level integration. The DGX B200 system power figure captures far more of the real burden than just quoting a single GPU chip number.

Layer 2

HBM and advanced packaging are the semiconductor bottlenecks that actually matter

People reach for "TSMC" first because it is famous. I think that is too coarse. The tighter near-term constraints are usually HBM stacks and CoWoS-class packaging. Raw leading-edge wafer starts matter, but they are not the cleanest first bind.

Packaging anchor	Public number	What it means
End-2024 CoWoS capacity	>35,000 wafers per month	Useful baseline for just how constrained advanced packaging still was entering 2025.
2025 CoWoS target	70,000-80,000 wafers per month	Big ramp, but still a bottleneck because AI demand is ramping at the same time.
End-2026 CoWoS target	~90,000 wafers per month	Capacity keeps rising, but not instantly.
2028-2029 CoWoS target	~150,000 wafers per month	This is the scale at which packaging stops being tiny and starts becoming industrial.
CoWoS fab build time	Cut from 3-5 years to ~1.5-2 years	Very important: even the bottleneck itself is being industrialized.

The exact package count supported by a given CoWoS wafer number is mix-sensitive because Blackwell-class packages consume very large interposer area. I use the CoWoS figures mainly to show ramp speed, not as a fake-precise unit forecast.

HBM stack demand is easier to reason about

Deployment target	Blackwell-class GPUs	HBM stacks needed at ~8 per GPU
One frontier cluster	32,768	~0.26 million stacks
100k GPU complex	100,000	~0.80 million stacks
500k GPU fleet	500,000	~4.00 million stacks
1.2 GW campus	~584,000	~4.67 million stacks
1 million GPU fleet	1,000,000	~8.00 million stacks

Working HBM ceiling

My rough working estimate is that 2025-2026 industry HBM output is on the order of 24-40 million stack equivalents per year, based on public market-revenue trackers divided by plausible stack ASPs. That implies something like 3-5 million Blackwell-class accelerators per year before scrap, yield loss, and non-frontier demand.

Important sanity check on vendor marketing language

Crusoe's Abilene expansion release says each building is designed to operate "up to 50,000 NVIDIA GB200 NVL72s." Taken literally, that cannot fit the stated site power. A 100 MW-class building supports roughly 50,000 Blackwell GPUs, not 50,000 72-GPU racks. Read that statement as roughly 50,000 GPUs or equivalent compute modules per building, not 50,000 rack-scale NVL72 systems.

Why I still include raw logic wafers, but rank them lower

If a large AI accelerator needs 1-2 leading-edge logic dies and a 300 mm wafer yields roughly 35-60 good large dies, then 1 million accelerators needs on the order of 17,000-57,000 advanced-node wafers. That is large, but it is not obviously the tightest system bottleneck if AI gets priority allocation. HBM and packaging are tighter because they are harder to substitute around.

Layer 3

Networking and optics are huge execution problems, but usually not the first hard ceiling

Once clusters get large, the network turns into a physical manufacturing problem: NICs, switch ASICs, optics, copper, fiber, and installation labor. The reason I rank it below power and HBM is not that it is small. It is that rich buyers can usually brute-force it more effectively than they can brute-force energized utility capacity.

Scale	DGX B200 systems	400G NIC ports at 8 per system	What that implies
32,768 GPUs	4,096	32,768 ports	Already a serious fabric, not a normal enterprise cluster.
100,000 GPUs	12,500	100,000 ports	Tens of thousands of optical connections and major switch-port demand.
1,000,000 GPUs	125,000	1,000,000 ports	Optics and fabric deployment become a major industrial operation.

DGX B200 publishes up to 8 single-port ConnectX-7 VPI connections at up to 400 Gb/s each, plus BlueField-3 DPUs. GB200 NVL72-class deployments then layer very high-bandwidth NVLink inside the rack and high-speed InfiniBand or Ethernet across racks.

Layer 4

Data center shells, cooling, and land: fast once the hard stuff is solved

The campus layer matters, but it is downstream of power. If I had to summarize this section in one sentence: shells are a critical path item, but not the primary economic governor.

Project	Published figures	Why it matters
Crusoe Abilene phase 1	2 buildings, 980,000 sq ft, 200+ MW, construction started June 2024, expected energized in H1 2025	Shows that once a site and interconnect are in hand, 200 MW-class AI capacity can appear quickly.
Crusoe Abilene full campus	8 buildings, ~4 million sq ft, 1.2 GW total power capacity, mid-2026 target for the second phase	This is the best public hard-data anchor for a modern giga-scale AI campus.
Crusoe construction scale	~2,000 workers daily at announcement, expected to approach ~5,000 with expansion; later blog cites 5,600+ workers on site	Labor and on-site execution are large, but they scale if money and power are there.
Meta Hyperion JV	~$27 billion total development cost for buildings plus long-lived power, cooling, and connectivity infrastructure	Confirms the capital intensity of frontier campuses even before the accelerator payload is fully counted.

Abilene phase 1 density: ~4.9 sq ft per kW of facility power.

Abilene full campus density: ~3.3 sq ft per kW of facility power.

100 MW facility building -> roughly 48,600 Blackwell-class GPUs at the DGX B200 power anchor.

150 MW facility building -> roughly 73,000 Blackwell-class GPUs at the same anchor.

Cooling is not automatically a water disaster

Crusoe's Abilene blog says the closed-loop cooling system needs only about 12,625 gallons per building per year for maintenance and water-quality management. That is tiny relative to what many people intuitively imagine. Water can still become a local political issue, but modern closed-loop direct-to-chip systems can make it much less important than power.

Layer 5

Power, interconnection, substations, and generation are the real governing layer

A modern AI campus does not care about abstract national electricity supply. It cares about firm, deliverable MW at a specific fence line, with the right interconnection study, substation design, switchgear, transformers, and backup power in place.

Average AI load	Annual electricity	Solar nameplate at 25% CF	Wind nameplate at 35% CF	Gas nameplate at 85% CF
1.0 GW	8.76 TWh/yr	4.0 GW	2.9 GW	1.18 GW
1.2 GW	10.51 TWh/yr	4.8 GW	3.4 GW	1.41 GW
5.0 GW	43.8 TWh/yr	20.0 GW	14.3 GW	5.88 GW
10.0 GW	87.6 TWh/yr	40.0 GW	28.6 GW	11.76 GW
20.0 GW	175.2 TWh/yr	80.0 GW	57.1 GW	23.53 GW

These nameplate figures are not saying solar, wind, and gas are interchangeable in practice. They are a simple way to show how quickly continuous AI load becomes a generation-scale problem.

Nameplate headlines can be misleading

EIA expected 62.8 GW of new US utility-scale generating capacity in 2024, and 81% of that was solar plus battery storage. That is a lot of nameplate, but much less than 62.8 GW of firm always-on power for AI. Frontier AI cares about dependable delivered MW, not celebratory national aggregate capacity numbers.

Abilene is a useful template

Crusoe says the site pairs a 1.2 GW grid interconnection with behind-the-meter battery storage, solar, nearby wind, and natural-gas-turbine backup. That is what a serious AI campus increasingly looks like: not just grid draw, but a negotiated campus power architecture.

What I think the true electrical bottlenecks are

Utility willingness to reserve and deliver large blocks of power.
Interconnection studies and substation buildouts.
Large transformers, switchgear, busway, and other high-voltage equipment that routinely run on multi-quarter to multi-year lead times.
Gas-turbine and backup-generation availability for fast-track energization.
Political permission if communities decide data centers are crowding out other users.

Layer 6

Logic wafers and lithography matter, but they are not my first bind

The popular version of this story is "TSMC decides everything." I think that is too simplistic. TSMC does matter. ASML and EUV matter. But for the next several years, my base-case deployment pace is more tightly governed by HBM, packaging, and energized power than by raw logic-wafer starts alone.

Back-of-envelope: if one high-end AI accelerator consumes 1-2 leading-edge logic dies and each 300 mm wafer yields roughly 35-60 good large dies, then 1 million accelerators need something like 17,000-57,000 advanced-node wafers. That is meaningful, but still a narrower bottleneck than the same million accelerators' need for ~8 million HBM stacks and ~2 GW of facility power.

Why I still keep an eye on EUV and leading-edge foundry tools

Even if raw wafer starts are not the first bind, they are still the speed limit on how fast the industry can expand leading-edge output over a multi-year horizon. EUV tool output is measured in dozens per year, not hundreds. That caps how quickly foundries can add true cutting-edge capacity. I just do not think it bites before power, HBM, and packaging in the 2026-2031 window unless those other problems get solved unusually well.

Scenario Ceiling

My rough annual build ceiling for new frontier AI deployment

These are not forecasts of what demand wants. They are estimates of what the stack can plausibly absorb as annual new deployment if centralized labs and hyperscalers keep spending aggressively.

Year	Conservative	Base	Aggressive	What has to go right
2027	~1.5M accelerators / ~3.1 GW new facility power	~2.2M accelerators / ~4.5 GW	~3.0M accelerators / ~6.2 GW	HBM and packaging ramp mostly on plan; multiple 100-500 MW sites energized on time.
2028	~2.2M accelerators / ~4.5 GW	~3.3M accelerators / ~6.8 GW	~4.5M accelerators / ~9.3 GW	Electrical equipment and utility coordination stop being the main brake on several giga-scale campuses at once.
2030	~3.0M accelerators / ~6.2 GW	~5.0M accelerators / ~10.3 GW	~7.5M accelerators / ~15.4 GW	Generation additions, transmission, and politics all have to cooperate rather than lag.
2031	~4.0M accelerators / ~8.2 GW	~6.0M accelerators / ~12.3 GW	~9.0M accelerators / ~18.5 GW	Aggressive case effectively requires industrial policy in everything but name.

These are annual new deployments. Cumulative installed base compounds on top of this. By 2030, even a base case implies many tens of GW of AI-dedicated power online globally.

My strongest caution

If someone shows you a story where frontier AI deployment jumps by dozens of firm GW per year before 2030 without discussing HBM, CoWoS, interconnection, switchgear, and generation, they are not doing real capacity analysis.

Bottleneck Order

What binds first by horizon

Horizon	Rank 1	Rank 2	Rank 3
2026-2027	Powered sites and interconnection	HBM and advanced packaging	Transformers, switchgear, and backup generation
2028	Electrical equipment plus utility allocation	HBM	Networking and optical deployment at extreme scale
2030	Generation, transmission, and local politics	HBM and packaging if demand remains insane	Specialized campus labor and construction sequencing
2031+	Political allocation of energy and rents	Grid expansion speed	Residual semiconductor packaging limits

Watch List

The numbers I would monitor every quarter

Campus reality, not slides

Energized MW actually online
How many buildings are live, not announced
Average time from land control to first energized building

Semiconductor reality

CoWoS wafers per month
HBM sold-through status by supplier
Whether packaging or memory slips push major launches

Electrical chain reality

Large transformer and switchgear lead times
Gas turbine availability
Utility interconnection approvals for 100 MW+ and 1 GW+ loads

Policy reality

Who gets priority access to constrained power
Whether states start attaching conditions to AI campus buildouts
Whether rent shaving extends from labs into power, land, and data center infrastructure

Source Notes

Public anchors used in this report

NVIDIA DGX B200 product page and markdown spec for 8 GPUs, 1,440 GB HBM3e, and ~14.3 kW max system power.
NVIDIA GB200 NVL72 product page and markdown spec for 72 Blackwell GPUs, 13.4 TB HBM3e, 130 TB/s NVLink, and rack-scale liquid-cooled architecture.
TrendForce and SemiMedia reporting on TSMC CoWoS capacity: >35k wpm in 2024, ~70-80k in 2025, ~90k by end-2026, and ~150k by 2028-2029.
TrendForce reporting that CoWoS facility build times have compressed from 3-5 years to roughly 1.5-2 years.
Crusoe March 2025 Abilene expansion release for the 1.2 GW, 8-building, 4 million sq ft campus and its mid-2026 phase-two target.
Crusoe September 2025 Abilene live-campus release for the June 2024 start date, first NVIDIA GB200 rack deliveries in June 2025, and the statement that the planned campus supports hundreds of thousands of GPUs.
Crusoe August 2025 Abilene blog for the 12,625 gallons per building per year cooling-maintenance figure and 5,600+ construction workers on site.
Meta October 2025 Hyperion joint-venture announcement for the roughly $27 billion development-cost figure for buildings plus long-lived power, cooling, and connectivity infrastructure.
EIA February 15, 2024 Today in Energy note that 62.8 GW of new US utility-scale electric-generating capacity was expected in 2024, with 81% coming from solar and battery storage.

Where the rough estimates start

The hardest public numbers to get cleanly are exact HBM stack output and exact per-rack Blackwell power in deployed configurations. For HBM, I use a rough industry-output range derived from public market-revenue trackers and plausible stack ASPs. For Blackwell rack power, I scale from the public DGX B200 system-power figure, which lands in the same general density class as public Blackwell rack integration discussions.