A single always-on 1 GW AI campus is not a rounding error. It is utility-scale electricity demand.
This report is a physical-capacity map, not a vibes document. The question is simple: if frontier AI stays centralized and demand stays effectively insatiable, what actually limits how fast intelligence can be deployed into the world?
I am treating training, reasoning, and inference as competing for the same scarce stack: accelerators, HBM, advanced packaging, optics, powered sites, substations, switchgear, transformers, and generation. The report is intentionally quantitative so you can build a mental picture rather than just inherit my conclusion.
- Frontier intelligence remains centralized and tightly controlled for the next several years.
- The report focuses on deployed capacity that is actually energized and populated, not just announced capex.
- Most clean public data is US and Taiwan heavy, so global estimates are extrapolated from those anchors.
- Where the public record stops, I make the estimate explicit rather than pretending the number is exact.
Using NVIDIA's 32,768-GPU cluster scale and DGX B200 system power, one real frontier cluster is already a small power plant.
That is the right order of magnitude for a serious frontier deployment, not a toy lab.
At a 1.15 PUE baseline and DGX B200 system power, this is roughly the scale implied by Crusoe Abilene.
My current view in one page
- The first hard bind is not software, land, or generic server manufacturing. It is powered sites plus the electrical chain, then HBM and advanced packaging.
- Once interconnect, equipment, and permits are in hand, shells can rise surprisingly fast. Abilene shows roughly how quickly giga-scale AI campuses can materialize after the hard prerequisites are solved.
- Raw logic wafers matter, but they are not my base-case first bind. The tighter semiconductor bottlenecks are HBM and CoWoS-class packaging.
- Networking and optics are large execution problems, but if the buyer has money and priority they usually sit behind power and memory in the queue of things that actually stop deployment.
- The practical annual ceiling for net new frontier AI-dedicated energized capacity is probably single-digit to low-teens GW before 2030 unless governments actively force power, permitting, and equipment allocation to move faster.
The layers that matter
Frontier training, reasoning, and low-latency inference all want the same accelerators and power.
Blackwell and MI300-class systems convert dollars into kW, HBM, and networking demand.
HBM and advanced packaging bind before raw logic wafers in my base case.
Land, shells, cooling loops, and dense rack integration matter, but mainly after power is secured.
Substations, transformers, switchgear, interconnection, generation, and political permission decide the pace.
Unit assumptions used throughout
| Input | Value used | Why I used it |
|---|---|---|
| DGX B200 system power | ~14.3 kW max for 8 GPUs | Cleanest public system-level power anchor from NVIDIA. |
| Blackwell-class GPU system power | ~1.79 kW per GPU | 14.3 / 8. This is better than using chip TDP alone because it includes full system overhead. |
| PUE baseline | 1.15 | Reasonable for new liquid-cooled AI facilities. Reality can be better or worse. |
| HBM per Blackwell-class GPU | ~180-186 GB | DGX B200 and GB200 public memory specs converge to this range. |
| HBM stacks per accelerator | ~8 stacks | Good working assumption for current high-end AI packages. |
| Frontier cluster reference | 32,768 GPUs | NVIDIA uses this scale in Blackwell and DGX B200 cluster comparisons. |
Important: these are modeling anchors. If a future generation materially changes GPU-to-kW or HBM-to-GPU ratios, the ceilings move.
What different scales actually look like
| Scale | GPUs | DGX B200 systems | GB200 NVL72 racks | Facility draw | Annual electricity |
|---|---|---|---|---|---|
| One frontier cluster | 32,768 | 4,096 | ~456 | ~67 MW | ~0.59 TWh |
| 100k GPU complex | 100,000 | 12,500 | ~1,390 | ~206 MW | ~1.80 TWh |
| 1.0 GW AI facility | ~486,000 | ~60,800 | ~6,760 | 1.0 GW | 8.76 TWh |
| 1.2 GW AI campus | ~584,000 | ~73,000 | ~8,110 | 1.2 GW | 10.51 TWh |
The NVL72 rack count is derived by scaling the DGX B200 per-GPU system power. It lands in the same rough 120-140 kW rack class generally discussed for Blackwell rack-scale deployments.
Accelerator node economics and power density
The cleanest public anchor here is NVIDIA DGX B200 because NVIDIA publishes both system power and memory. I use that as the conversion engine for everything else.
| System | Published spec | Why it matters |
|---|---|---|
| DGX B200 | 8 Blackwell GPUs, 1,440 GB total HBM3e, 64 TB/s HBM bandwidth, ~14.3 kW max | This gives a system-level per-GPU power anchor of ~1.79 kW and ~180 GB HBM per GPU. |
| GB200 NVL72 | 72 Blackwell GPUs, 36 Grace CPUs, 13.4 TB HBM3e, 130 TB/s NVLink, liquid-cooled rack-scale design | This is the public picture of what high-density frontier deployment actually wants to look like. |
| NVIDIA cluster reference | 32,768 GPU scale in B200 and Blackwell comparison footnotes | This is a useful reference for a serious frontier training cluster rather than a single box. |
Why I trust this scaling more than chip TDP headlines
Chip TDP by itself understates reality because real deployments pay for CPUs, DPUs, fans or pumps, memory, board power delivery, and rack-level integration. The DGX B200 system power figure captures far more of the real burden than just quoting a single GPU chip number.
HBM and advanced packaging are the semiconductor bottlenecks that actually matter
People reach for "TSMC" first because it is famous. I think that is too coarse. The tighter near-term constraints are usually HBM stacks and CoWoS-class packaging. Raw leading-edge wafer starts matter, but they are not the cleanest first bind.
| Packaging anchor | Public number | What it means |
|---|---|---|
| End-2024 CoWoS capacity | >35,000 wafers per month | Useful baseline for just how constrained advanced packaging still was entering 2025. |
| 2025 CoWoS target | 70,000-80,000 wafers per month | Big ramp, but still a bottleneck because AI demand is ramping at the same time. |
| End-2026 CoWoS target | ~90,000 wafers per month | Capacity keeps rising, but not instantly. |
| 2028-2029 CoWoS target | ~150,000 wafers per month | This is the scale at which packaging stops being tiny and starts becoming industrial. |
| CoWoS fab build time | Cut from 3-5 years to ~1.5-2 years | Very important: even the bottleneck itself is being industrialized. |
The exact package count supported by a given CoWoS wafer number is mix-sensitive because Blackwell-class packages consume very large interposer area. I use the CoWoS figures mainly to show ramp speed, not as a fake-precise unit forecast.
HBM stack demand is easier to reason about
| Deployment target | Blackwell-class GPUs | HBM stacks needed at ~8 per GPU |
|---|---|---|
| One frontier cluster | 32,768 | ~0.26 million stacks |
| 100k GPU complex | 100,000 | ~0.80 million stacks |
| 500k GPU fleet | 500,000 | ~4.00 million stacks |
| 1.2 GW campus | ~584,000 | ~4.67 million stacks |
| 1 million GPU fleet | 1,000,000 | ~8.00 million stacks |
Working HBM ceiling
My rough working estimate is that 2025-2026 industry HBM output is on the order of 24-40 million stack equivalents per year, based on public market-revenue trackers divided by plausible stack ASPs. That implies something like 3-5 million Blackwell-class accelerators per year before scrap, yield loss, and non-frontier demand.
Important sanity check on vendor marketing language
Crusoe's Abilene expansion release says each building is designed to operate "up to 50,000 NVIDIA GB200 NVL72s." Taken literally, that cannot fit the stated site power. A 100 MW-class building supports roughly 50,000 Blackwell GPUs, not 50,000 72-GPU racks. Read that statement as roughly 50,000 GPUs or equivalent compute modules per building, not 50,000 rack-scale NVL72 systems.
Why I still include raw logic wafers, but rank them lower
If a large AI accelerator needs 1-2 leading-edge logic dies and a 300 mm wafer yields roughly 35-60 good large dies, then 1 million accelerators needs on the order of 17,000-57,000 advanced-node wafers. That is large, but it is not obviously the tightest system bottleneck if AI gets priority allocation. HBM and packaging are tighter because they are harder to substitute around.
Networking and optics are huge execution problems, but usually not the first hard ceiling
Once clusters get large, the network turns into a physical manufacturing problem: NICs, switch ASICs, optics, copper, fiber, and installation labor. The reason I rank it below power and HBM is not that it is small. It is that rich buyers can usually brute-force it more effectively than they can brute-force energized utility capacity.
| Scale | DGX B200 systems | 400G NIC ports at 8 per system | What that implies |
|---|---|---|---|
| 32,768 GPUs | 4,096 | 32,768 ports | Already a serious fabric, not a normal enterprise cluster. |
| 100,000 GPUs | 12,500 | 100,000 ports | Tens of thousands of optical connections and major switch-port demand. |
| 1,000,000 GPUs | 125,000 | 1,000,000 ports | Optics and fabric deployment become a major industrial operation. |
DGX B200 publishes up to 8 single-port ConnectX-7 VPI connections at up to 400 Gb/s each, plus BlueField-3 DPUs. GB200 NVL72-class deployments then layer very high-bandwidth NVLink inside the rack and high-speed InfiniBand or Ethernet across racks.
Data center shells, cooling, and land: fast once the hard stuff is solved
The campus layer matters, but it is downstream of power. If I had to summarize this section in one sentence: shells are a critical path item, but not the primary economic governor.
| Project | Published figures | Why it matters |
|---|---|---|
| Crusoe Abilene phase 1 | 2 buildings, 980,000 sq ft, 200+ MW, construction started June 2024, expected energized in H1 2025 | Shows that once a site and interconnect are in hand, 200 MW-class AI capacity can appear quickly. |
| Crusoe Abilene full campus | 8 buildings, ~4 million sq ft, 1.2 GW total power capacity, mid-2026 target for the second phase | This is the best public hard-data anchor for a modern giga-scale AI campus. |
| Crusoe construction scale | ~2,000 workers daily at announcement, expected to approach ~5,000 with expansion; later blog cites 5,600+ workers on site | Labor and on-site execution are large, but they scale if money and power are there. |
| Meta Hyperion JV | ~$27 billion total development cost for buildings plus long-lived power, cooling, and connectivity infrastructure | Confirms the capital intensity of frontier campuses even before the accelerator payload is fully counted. |
Cooling is not automatically a water disaster
Crusoe's Abilene blog says the closed-loop cooling system needs only about 12,625 gallons per building per year for maintenance and water-quality management. That is tiny relative to what many people intuitively imagine. Water can still become a local political issue, but modern closed-loop direct-to-chip systems can make it much less important than power.
Power, interconnection, substations, and generation are the real governing layer
A modern AI campus does not care about abstract national electricity supply. It cares about firm, deliverable MW at a specific fence line, with the right interconnection study, substation design, switchgear, transformers, and backup power in place.
| Average AI load | Annual electricity | Solar nameplate at 25% CF | Wind nameplate at 35% CF | Gas nameplate at 85% CF |
|---|---|---|---|---|
| 1.0 GW | 8.76 TWh/yr | 4.0 GW | 2.9 GW | 1.18 GW |
| 1.2 GW | 10.51 TWh/yr | 4.8 GW | 3.4 GW | 1.41 GW |
| 5.0 GW | 43.8 TWh/yr | 20.0 GW | 14.3 GW | 5.88 GW |
| 10.0 GW | 87.6 TWh/yr | 40.0 GW | 28.6 GW | 11.76 GW |
| 20.0 GW | 175.2 TWh/yr | 80.0 GW | 57.1 GW | 23.53 GW |
These nameplate figures are not saying solar, wind, and gas are interchangeable in practice. They are a simple way to show how quickly continuous AI load becomes a generation-scale problem.
Nameplate headlines can be misleading
EIA expected 62.8 GW of new US utility-scale generating capacity in 2024, and 81% of that was solar plus battery storage. That is a lot of nameplate, but much less than 62.8 GW of firm always-on power for AI. Frontier AI cares about dependable delivered MW, not celebratory national aggregate capacity numbers.
Abilene is a useful template
Crusoe says the site pairs a 1.2 GW grid interconnection with behind-the-meter battery storage, solar, nearby wind, and natural-gas-turbine backup. That is what a serious AI campus increasingly looks like: not just grid draw, but a negotiated campus power architecture.
What I think the true electrical bottlenecks are
- Utility willingness to reserve and deliver large blocks of power.
- Interconnection studies and substation buildouts.
- Large transformers, switchgear, busway, and other high-voltage equipment that routinely run on multi-quarter to multi-year lead times.
- Gas-turbine and backup-generation availability for fast-track energization.
- Political permission if communities decide data centers are crowding out other users.
Logic wafers and lithography matter, but they are not my first bind
The popular version of this story is "TSMC decides everything." I think that is too simplistic. TSMC does matter. ASML and EUV matter. But for the next several years, my base-case deployment pace is more tightly governed by HBM, packaging, and energized power than by raw logic-wafer starts alone.
Why I still keep an eye on EUV and leading-edge foundry tools
Even if raw wafer starts are not the first bind, they are still the speed limit on how fast the industry can expand leading-edge output over a multi-year horizon. EUV tool output is measured in dozens per year, not hundreds. That caps how quickly foundries can add true cutting-edge capacity. I just do not think it bites before power, HBM, and packaging in the 2026-2031 window unless those other problems get solved unusually well.
My rough annual build ceiling for new frontier AI deployment
These are not forecasts of what demand wants. They are estimates of what the stack can plausibly absorb as annual new deployment if centralized labs and hyperscalers keep spending aggressively.
| Year | Conservative | Base | Aggressive | What has to go right |
|---|---|---|---|---|
| 2027 | ~1.5M accelerators / ~3.1 GW new facility power | ~2.2M accelerators / ~4.5 GW | ~3.0M accelerators / ~6.2 GW | HBM and packaging ramp mostly on plan; multiple 100-500 MW sites energized on time. |
| 2028 | ~2.2M accelerators / ~4.5 GW | ~3.3M accelerators / ~6.8 GW | ~4.5M accelerators / ~9.3 GW | Electrical equipment and utility coordination stop being the main brake on several giga-scale campuses at once. |
| 2030 | ~3.0M accelerators / ~6.2 GW | ~5.0M accelerators / ~10.3 GW | ~7.5M accelerators / ~15.4 GW | Generation additions, transmission, and politics all have to cooperate rather than lag. |
| 2031 | ~4.0M accelerators / ~8.2 GW | ~6.0M accelerators / ~12.3 GW | ~9.0M accelerators / ~18.5 GW | Aggressive case effectively requires industrial policy in everything but name. |
These are annual new deployments. Cumulative installed base compounds on top of this. By 2030, even a base case implies many tens of GW of AI-dedicated power online globally.
My strongest caution
If someone shows you a story where frontier AI deployment jumps by dozens of firm GW per year before 2030 without discussing HBM, CoWoS, interconnection, switchgear, and generation, they are not doing real capacity analysis.
What binds first by horizon
| Horizon | Rank 1 | Rank 2 | Rank 3 |
|---|---|---|---|
| 2026-2027 | Powered sites and interconnection | HBM and advanced packaging | Transformers, switchgear, and backup generation |
| 2028 | Electrical equipment plus utility allocation | HBM | Networking and optical deployment at extreme scale |
| 2030 | Generation, transmission, and local politics | HBM and packaging if demand remains insane | Specialized campus labor and construction sequencing |
| 2031+ | Political allocation of energy and rents | Grid expansion speed | Residual semiconductor packaging limits |
The numbers I would monitor every quarter
- Energized MW actually online
- How many buildings are live, not announced
- Average time from land control to first energized building
- CoWoS wafers per month
- HBM sold-through status by supplier
- Whether packaging or memory slips push major launches
- Large transformer and switchgear lead times
- Gas turbine availability
- Utility interconnection approvals for 100 MW+ and 1 GW+ loads
- Who gets priority access to constrained power
- Whether states start attaching conditions to AI campus buildouts
- Whether rent shaving extends from labs into power, land, and data center infrastructure
Public anchors used in this report
- NVIDIA DGX B200 product page and markdown spec for 8 GPUs, 1,440 GB HBM3e, and ~14.3 kW max system power.
- NVIDIA GB200 NVL72 product page and markdown spec for 72 Blackwell GPUs, 13.4 TB HBM3e, 130 TB/s NVLink, and rack-scale liquid-cooled architecture.
- TrendForce and SemiMedia reporting on TSMC CoWoS capacity: >35k wpm in 2024, ~70-80k in 2025, ~90k by end-2026, and ~150k by 2028-2029.
- TrendForce reporting that CoWoS facility build times have compressed from 3-5 years to roughly 1.5-2 years.
- Crusoe March 2025 Abilene expansion release for the 1.2 GW, 8-building, 4 million sq ft campus and its mid-2026 phase-two target.
- Crusoe September 2025 Abilene live-campus release for the June 2024 start date, first NVIDIA GB200 rack deliveries in June 2025, and the statement that the planned campus supports hundreds of thousands of GPUs.
- Crusoe August 2025 Abilene blog for the 12,625 gallons per building per year cooling-maintenance figure and 5,600+ construction workers on site.
- Meta October 2025 Hyperion joint-venture announcement for the roughly $27 billion development-cost figure for buildings plus long-lived power, cooling, and connectivity infrastructure.
- EIA February 15, 2024 Today in Energy note that 62.8 GW of new US utility-scale electric-generating capacity was expected in 2024, with 81% coming from solar and battery storage.
Where the rough estimates start
The hardest public numbers to get cleanly are exact HBM stack output and exact per-rack Blackwell power in deployed configurations. For HBM, I use a rough industry-output range derived from public market-revenue trackers and plausible stack ASPs. For Blackwell rack power, I scale from the public DGX B200 system-power figure, which lands in the same general density class as public Blackwell rack integration discussions.