Ch 11: Power, Cooling & Energy — AI Infrastructure

Ch 11 — Power, Cooling & Energy: The Physical Limits of AI

PUE, liquid cooling, rack density, water usage, grid constraints, and sustainability

arrow_backIndex

Hands-On

bolt

Power Budget

arrow_forward

thermostat

PUE

arrow_forward

air

Air Cooling

arrow_forward

water_drop

Liquid Cool

arrow_forward

view_in_ar

Rack Density

arrow_forward

water

Water Usage

arrow_forward

electrical_services

Grid

arrow_forward

eco

Sustainability

Click play or press Space to begin...

Step- / 8

bolt

The Power Budget: From Chip to Data Center

Every watt consumed by a GPU must be delivered, converted, and removed as heat

Power at Every Level

AI infrastructure power consumption cascades from individual chips to entire data centers:

Chip level: An H100 GPU draws 700W at full load. A B200 draws 1,000W. These are the highest-power chips ever mass-produced — comparable to a microwave oven per chip.

Node level: A DGX H100 (8 GPUs + CPUs + networking + storage) draws 10.2 kW. A GB200 NVL72 rack draws 120 kW — enough to power 40 homes.

Cluster level: A 1,000-GPU cluster (125 nodes) draws ~1.3 MW of IT load. With cooling and power distribution overhead (PUE ~1.3), total facility draw is ~1.7 MW.

Data center level: A hyperscale AI data center runs 50–500 MW. For context, a nuclear power plant produces ~1,000 MW. The largest planned AI campuses will require their own power plants.

Power Consumption by GPU Generation

GPU TDP Node (8×) FLOPS/W ────────────────────────────────────────────── A100 (2020) 400W 6.5 kW 0.78 TF/W H100 (2022) 700W 10.2 kW 1.32 TF/W H200 (2024) 700W 10.2 kW 1.32 TF/W B200 (2025) 1,000W 14.3 kW 1.80 TF/W GB200 NVL72 ~1,200W 120 kW* ~2.0 TF/W * GB200 NVL72 is a full rack, not a single node Trend: Power per chip doubles every 2 years, but FLOPS/watt improves ~1.4× per generation. Net result: more total power, better efficiency. 1,000 H100 cluster power budget: GPU IT load: 1,000 × 700W = 700 kW CPU/NIC/storage: ~40% overhead = 280 kW Networking: ~100 kW Total IT: ~1,080 kW Cooling (PUE 1.3): +324 kW Power dist. losses: +~100 kW Total facility: ~1,500 kW (1.5 MW)

Key insight: Every watt that enters a data center leaves as heat. A GPU is essentially a very expensive space heater that does math on the side. The engineering challenge isn’t just delivering power — it’s removing the heat fast enough to prevent the chips from throttling or failing.

thermostat

PUE: Measuring Data Center Efficiency

How much energy goes to computing vs overhead

What Is PUE?

Power Usage Effectiveness (PUE) is the ratio of total facility power to IT equipment power:

PUE = Total Facility Power ÷ IT Equipment Power

A PUE of 1.0 means every watt goes to computing (physically impossible). A PUE of 2.0 means half the power goes to cooling and overhead. Lower is better.

Industry averages:
• Legacy data centers: PUE 1.8–2.5 (40–60% overhead)
• Modern air-cooled: PUE 1.2–1.4 (17–29% overhead)
• Google/Meta best: PUE 1.10–1.12
• Liquid-cooled AI: PUE 1.03–1.15 (3–13% overhead)

The difference matters enormously at scale. On a 100 MW AI data center, improving PUE from 1.3 to 1.1 saves 15.4 MW — enough to power 12,000 homes, saving ~$13.5M/year in electricity.

PUE Impact at Scale

Scenario: 100 MW IT load, $0.10/kWh PUE 1.50 (legacy air-cooled): Total power: 100 × 1.50 = 150 MW Overhead: 50 MW (cooling + distribution) Annual cost: 150,000 × 8,760 × $0.10 = $131.4M Overhead cost: $43.8M/yr PUE 1.20 (modern air-cooled): Total power: 120 MW Overhead: 20 MW Annual cost: $105.1M Savings vs 1.50: $26.3M/yr PUE 1.06 (liquid-cooled): Total power: 106 MW Overhead: 6 MW Annual cost: $92.9M Savings vs 1.50: $38.5M/yr Savings vs 1.20: $12.3M/yr # At 100 MW scale, every 0.01 PUE improvement # saves ~$876K/year in electricity.

Key insight: PUE is like the fuel efficiency of a car. A PUE of 1.5 means for every mile of “computing” you drive, you waste half a mile’s worth of fuel on air conditioning. Liquid cooling is like switching from a gas guzzler to a hybrid — same destination, 30% less fuel.

air

Air Cooling: The Traditional Approach

Fans, CRAC units, hot/cold aisles — and why they’re hitting their limits

How Air Cooling Works

Traditional data center cooling uses air as the heat transport medium:

1. Cold aisle: Chilled air (18–27°C) enters the front of server racks from a raised floor or overhead ducts.
2. Server fans: Pull cold air through the server, absorbing heat from CPUs, GPUs, and other components.
3. Hot aisle: Heated air (35–45°C) exits the back of racks into a contained hot aisle.
4. CRAC/CRAH units: Computer Room Air Conditioning/Handling units cool the hot air using chilled water or refrigerant, then recirculate it.

Hot/cold aisle containment prevents mixing of cold supply air with hot exhaust, improving efficiency by 20–30%.

Air Cooling Limits

Air has poor thermal properties compared to liquids:

Heat capacity: Air carries ~1 kJ/kg/°C vs water at ~4.2 kJ/kg/°C. Water absorbs 4× more heat per unit mass.

Practical limit: Air cooling maxes out at ~40–50 kW per rack. A DGX H100 at 10.2 kW fits, but a GB200 NVL72 at 120 kW is physically impossible to air-cool — you’d need hurricane-force airflow.

Fan power: At high densities, server fans consume 10–15% of total server power. CRAC units add another 15–25%. Combined, cooling overhead reaches 30–40% of IT power (PUE 1.3–1.4).

Air vs Liquid: The Physics

Property Air Water Ratio ────────────────────────────────────────────────── Density (kg/m³) 1.2 997 831× Specific heat 1.0 kJ/kg°C 4.2 kJ/kg°C 4.2× Thermal cond. 0.026 W/mK 0.6 W/mK 23× Volumetric heat 1.2 kJ/m³°C 4,187 kJ/m³°C 3,489× # Water carries 3,489× more heat per unit volume! Max rack density by cooling method: Air (traditional): ~15-25 kW Air (high-density): ~40-50 kW Rear-door heat exchanger: ~50-80 kW Direct-to-chip liquid: ~100-140 kW Immersion cooling: ~200-350 kW GB200 NVL72 rack: 120 kW Air cooling: IMPOSSIBLE Direct-to-chip: Required (NVIDIA spec)

Key insight: Air cooling for AI GPUs is like trying to cool a blast furnace with a desk fan. It worked when servers drew 5–10 kW per rack, but at 120 kW per rack, you’d need a wind tunnel. The physics simply don’t allow it — which is why NVIDIA’s GB200 NVL72 requires liquid cooling. There’s no air-cooled option.

water_drop

Liquid Cooling: Direct-to-Chip & Immersion

The technologies enabling 100+ kW racks and PUE below 1.1

Direct-to-Chip (Cold Plate)

The mainstream production solution for AI data centers. Cold plates are metal blocks with internal channels mounted directly on GPU/CPU packages. Chilled water (or coolant) flows through the channels, absorbing heat at the source.

How it works:
1. Cold plates bolted to GPU/CPU packages
2. Coolant loops connect cold plates to a Coolant Distribution Unit (CDU)
3. CDU exchanges heat with the building’s chilled water loop
4. Remaining components (memory, VRMs) still air-cooled

Performance: Removes 60–80% of total heat via liquid. Enables 100–140 kW per rack. PUE of 1.05–1.15. CAPEX premium: 15–25% over air cooling, but operational savings recover the investment in 12–18 months at scale.

This is what NVIDIA requires for GB200 NVL72 and recommends for H100/H200 at high density.

Immersion Cooling

Single-phase immersion: Entire servers submerged in a dielectric (non-conductive) fluid. The fluid absorbs heat from all components. Pumped to a heat exchanger for cooling. Enables 200–350+ kW per rack. PUE of 1.02–1.08.

Two-phase immersion: Uses a fluid that boils at low temperature (~49°C). Components are submerged; the fluid boils on hot surfaces, carrying heat as vapor. Vapor condenses on a cold plate above, dripping back down. Extremely efficient but complex to manage.

Trade-offs: 40–60% CAPEX premium over air. Requires specialized tanks, fluid management, and maintenance training. Fluid costs $50–200 per liter. Reduces water consumption by 95–98% vs evaporative cooling.

Cooling Technology Comparison

Method PUE Max kW/rack CAPEX Water ────────────────────────────────────────────────── Air (CRAC) 1.3-1.8 40-50 Low High Direct-to-chip 1.05-1.15 100-140 +15-25% Low Immersion (1φ) 1.02-1.08 200-350 +40-60% Minimal Immersion (2φ) 1.02-1.05 250-350+ +50-70% Minimal Market projection (2030): Liquid cooling market: $15-20B 50% of new hyperscale capacity by 2027 Direct-to-chip: mainstream production Immersion: emerging, growing fast

Key insight: The shift from air to liquid cooling is like the shift from propellers to jet engines in aviation. Propellers worked fine at 300 mph, but to go faster, you needed fundamentally different physics. Air cooling works at 40 kW/rack, but at 120+ kW, you need liquid — there’s no incremental improvement that bridges the gap.

view_in_ar

Rack Density: The Space Crunch

How AI is compressing an entire data center into a single row of racks

The Density Explosion

Traditional data centers were designed for 5–8 kW per rack. A standard 42U rack held a mix of 1U servers, switches, and storage. The entire facility was designed around this density: floor loading, power distribution, cooling capacity, cable pathways.

AI has shattered these assumptions:

2020: A100 DGX — 6.5 kW/node, ~30 kW/rack
2022: H100 DGX — 10.2 kW/node, ~50 kW/rack
2025: GB200 NVL72 — 120 kW/rack
2026+: Next-gen — 200–350 kW/rack projected

This means a single AI rack now draws more power than an entire row of traditional servers. Existing data centers physically cannot support these densities without major retrofits to power distribution, cooling, and structural support (a 120 kW rack with liquid cooling weighs 1,500+ kg).

Density Impact on Data Center Design

Traditional DC (5 kW/rack): 1,000 racks × 5 kW = 5 MW IT load Floor space: ~50,000 sq ft Cooling: Air (CRAC units) Power: Standard PDUs AI DC (120 kW/rack): 42 racks × 120 kW = 5 MW IT load Floor space: ~3,000 sq ft (16× less!) Cooling: Liquid (CDUs, piping) Power: High-density busway, 480V Same 5 MW compute in: Traditional: 1,000 racks, 50K sq ft AI-optimized: 42 racks, 3K sq ft # But the 42 racks need: # - 24× the power per rack # - Liquid cooling infrastructure # - Reinforced floors (1,500+ kg/rack) # - High-density power distribution

Key insight: AI rack density is like going from a suburban neighborhood to a skyscraper. Same number of people (compute), fraction of the land (floor space), but you need elevators (liquid cooling), steel beams (reinforced floors), and high-voltage power risers. You can’t just stack houses on top of each other — you need purpose-built infrastructure.

water

Water Usage: AI’s Thirst

How much water AI consumes, and the push toward water-free cooling

Where the Water Goes

Data centers use water primarily for evaporative cooling. Cooling towers evaporate water to reject heat from the building’s chilled water loop. This is energy-efficient (lower PUE) but water-intensive.

Water Usage Effectiveness (WUE) measures liters of water per kWh of IT energy:
• Evaporative cooling: WUE 1.0–2.0 L/kWh
• Dry coolers (no evaporation): WUE ~0 L/kWh (but higher PUE)
• Liquid cooling (closed loop): WUE ~0 L/kWh

U.S. data centers: ~17.5 billion gallons in 2023, projected to double by 2028. AI represents ~15–20% of data center energy, translating to ~10 million gallons daily for AI workloads.

Per-Query Water Usage

ChatGPT query: ~0.3 mL (about 11,765 queries per gallon). A glass of water covers ~750 queries.

AI image generation: 15–60 mL per image. The viral claim of “10 gallons per image” is a myth — actual usage is 1,000× less.

Training GPT-3: ~700,000 liters total. Training Grok 4: ~750 million liters (including all data center cooling).

The per-query numbers seem small, but at billions of queries per day, they add up. OpenAI alone likely consumes millions of gallons per month for inference.

Water Reduction Strategies

Strategy Water Savings Trade-off ────────────────────────────────────────────────── Dry coolers 95-100% Higher PUE (+0.1-0.3) Closed-loop liquid 95-98% Higher CAPEX Immersion cooling 98-100% Highest CAPEX Adiabatic (hybrid) 50-70% Moderate CAPEX Cold climate siting 60-80% Limited locations Industry commitments: Microsoft: Zero-water evaporative cooling in all new DCs (Aug 2024) Google: 120% water replenishment by 2030 Meta: Net-positive water by 2030 Cost of water vs electricity: Water: ~$0.005/gallon (municipal) Electricity saved by evaporative: ~$0.02/kWh At 100 MW: evaporative saves $1.5M/yr in electricity but uses ~200M gallons/yr of water # The economics favor evaporative cooling, # but social/regulatory pressure is shifting # the industry toward water-free solutions.

Key insight: Water usage in AI data centers is a real concern but often exaggerated in headlines. A single ChatGPT query uses less water than a sip from a glass. The real issue is aggregate scale: billions of queries per day at thousands of data centers. The industry is moving toward closed-loop liquid cooling that uses zero water — solving both the density and sustainability problems simultaneously.

electrical_services

Grid Constraints: The Power Wall

Why electricity availability is becoming AI’s biggest bottleneck

The Grid Bottleneck

Data centers consumed ~4.4% of U.S. electricity in 2025, projected to reach 12% by 2028. U.S. data center spending exceeded $500 billion in 2025. The problem: the electrical grid wasn’t built for this.

Grid connection lead times: Getting a new 100+ MW power connection takes 3–7 years in most U.S. markets. This includes permitting, substation construction, transmission line upgrades, and utility interconnection agreements. AI companies need power now, but the grid moves at the speed of bureaucracy.

Geographic concentration: Northern Virginia (Loudoun County) hosts ~70% of U.S. data center capacity. The local grid is maxed out. New projects face 5+ year wait times for power. This is driving expansion to Ohio, Texas, and international markets.

Power Sourcing Strategies

Grid power: Cheapest ($0.04–0.10/kWh) but limited availability and long lead times.

On-site generation: Natural gas turbines provide 10–100 MW per site. Fast to deploy (12–18 months) but carbon-intensive.

Renewable PPAs: Power Purchase Agreements for solar/wind. $0.03–0.06/kWh but intermittent. Need battery storage or grid backup.

Nuclear (SMRs): Small Modular Reactors (50–300 MW) promise carbon-free baseload power. Microsoft signed a deal to restart Three Mile Island. Amazon invested in SMR startups. Timeline: 2028–2032 for first deployments.

Power Economics at Scale

100 MW AI data center, annual power cost: Grid power ($0.07/kWh): 100,000 kW × 8,760 hrs × $0.07 = $61.3M/yr Renewable PPA ($0.04/kWh): + Battery storage ($0.02/kWh): $52.6M/yr On-site gas ($0.06/kWh): $52.6M/yr + carbon costs Nuclear SMR ($0.05-0.08/kWh, projected): $43.8-70.1M/yr (carbon-free) Power as % of total DC cost: Traditional DC: ~30-40% of opex AI DC (GPU-heavy): ~15-25% of opex (GPUs dominate cost; power is secondary) Electricity price sensitivity: ±$0.01/kWh on 100 MW = ±$8.76M/yr → Location choice matters enormously

Key insight: Power availability is becoming the primary constraint on AI growth — more than GPU supply, more than talent, more than capital. You can buy GPUs with money, but you can’t buy grid capacity that doesn’t exist. The companies that secure power first will have a structural advantage for the next decade.

eco

Sustainability: The Path Forward

Balancing AI’s growing energy appetite with environmental responsibility

The Carbon Footprint

AI’s carbon footprint depends on the energy source:

Training Llama 3 405B: ~16,000 GPU-hours on H100s. At 700W per GPU + overhead, total energy ~15,000 MWh. With U.S. average grid (0.4 kg CO2/kWh): ~6,000 tonnes CO2 — equivalent to 1,300 cars for a year.

With renewable energy: Same training, near-zero direct emissions. Meta, Google, and Microsoft all claim 100% renewable energy matching (though not 24/7 carbon-free in all locations).

Inference dominates long-term: Training happens once; inference runs forever. A popular model serving millions of users daily consumes far more total energy over its lifetime than the training run. Inference efficiency improvements have outsized climate impact.

Efficiency as Sustainability

The most impactful sustainability strategy is doing more with less:

Model efficiency: Smaller models (7–13B) can match larger models on many tasks. Llama 3 8B matches GPT-3.5 on most benchmarks at 1/20th the compute.

Quantization: FP8 inference halves energy per token vs FP16.

Hardware efficiency: Each GPU generation improves FLOPS/watt by ~1.4×. B200 delivers 2× the performance of H100 at 1.4× the power.

Inference optimization: Continuous batching, PagedAttention, and speculative decoding can reduce energy per token by 3–5× vs naive serving.

Sustainability Scorecard

Strategy CO2 Reduction Feasibility ────────────────────────────────────────────────── Renewable energy 80-100% Available now Liquid cooling 15-30% Available now Model distillation 50-90% Available now Quantization (FP8) 40-50% Available now Inference optimization 60-80% Available now Nuclear (SMRs) 90-100% 2028-2032 Waste heat reuse 10-20% Emerging Combined potential: Unoptimized AI (2023): 1.0× baseline Optimized AI (2026): ~0.15× baseline → ~85% reduction through available techniques Waste heat reuse examples: Meta: Heating nearby buildings in Luleå, Sweden Microsoft: District heating in Finland Potential: 60-70% of DC heat is recoverable at 40-60°C (useful for heating, not electricity)

Key insight: AI’s energy problem is real but solvable. The combination of renewable energy, efficient hardware, optimized software, and liquid cooling can reduce AI’s carbon footprint by 85% compared to naive 2023 deployments. The question isn’t whether AI can be sustainable — it’s whether the industry will invest in sustainability before regulation forces it.

arrow_backPrevious Chapter Next Chapterarrow_forward