The current consensus in Silicon Valley is simple: bigger is better. Bigger models, bigger datasets, and—most critically—bigger data centers. We are witnessing a capital expenditure boom that rivals the build-out of the early internet, with hyperscalers pouring hundreds of billions into a single bet: that the path to Artificial General Intelligence (AGI) is paved with more GPUs and more megawatts.
But this consensus is hitting a physical wall. We are entering the “de-centralization phase” of AI, driven not just by architectural logic, but by the hard physics of power delivery. The next trillion dollars of value won’t be created by training the largest model in a fortress; it will be created by the infrastructure that allows AI to run efficiently, securely, and locally everywhere else.
The Physics of Panic
The inciting incident for this shift is happening in the boiler rooms of the world’s data centers. For the better part of a decade, a standard server rack consumed about 10 to 20 kilowatts (kW) of power. Today, with the arrival of NVIDIA’s Blackwell architecture and similar high-performance silicon, we are seeing rack power densities jump to over 100kW, with liquid cooling becoming a requirement rather than a luxury.
This isn’t just an engineering challenge – it’s a grid crisis. While hyperscalers can buy all the GPUs they want, they cannot buy the physics of electricity transmission. In the United States, grid interconnect queues—the waiting line to plug new power generation into the grid—have stretched to over four years. You literally cannot build transmission lines fast enough to match the scaling laws of transformers.
We are seeing the symptoms of this bottleneck in the desperate, almost panic-driven moves by major tech companies to acquire nuclear power assets. When software companies start buying Three Mile Island, it’s a signal that the traditional path of scaling is fracturing. They are trying to brute-force a solution to a problem that requires a fundamental architectural rethink. When a resource becomes this constrained, the market invariably shifts value from brute force to efficiency.
Small Language Models: Compressible Infrastructure
The first beneficiary of this shift is the Small Language Model (SLM). For the past two years, the industry has been obsessed with “General Purpose Gods”—models like GPT-4 that can do everything from writing poetry to coding Python. But for 90% of enterprise use cases, you don’t need a god; you need a specialized worker.
Running a 175-billion parameter model to summarize a meeting or route a customer support ticket is economically ruinous at scale. It’s like using a Ferrari to deliver the mail. SLMs—models with 7 billion parameters or fewer—are rapidly proving they can deliver 90% of the performance for 1% of the inference cost when fine-tuned for specific tasks.
However, the VC argument for SLMs isn’t just about cost savings; it’s about infrastructure compressibility. Because SLMs can run on consumer-grade hardware or modest edge servers, they bypass the hyperscale energy stranglehold. They treat compute as an abundant resource available at the edge, rather than a scarce resource hoarded in a Virginia data center.
The Edge is No Longer Optional
This compressibility unlocks the venue where the real physical economy operates: the Edge.
Consider a modern factory floor. If you want an AI agent to control a robotic arm or monitor a high-speed assembly line, the speed of light becomes a legitimate adversary. You cannot afford the latency of sending video data to the cloud, processing it, and sending a command back.
Furthermore, the data gravity argument is undeniable. Gartner estimates that by 2025, 75% of enterprise data will be created and processed outside the traditional data center or cloud. Yet, we currently spend billions moving this heavy, bandwidth-intensive data to centralized clouds just to process it. This is an artifact of the “old stack.”
The investment thesis here targets the “interconnect layer”—startups building the orchestration software that allows a swarm of heterogeneous devices (gateways, on-prem servers, industrial PCs) to act as a coherent, distributed data center.
The Missing Trust Layer: Confidential Computing
There is, however, a massive catch. When you move high-value AI models from a secure Google fortress to a server closet in a hospital or a gateway in a smart city, you lose physical control. This creates a paralyzing “trust gap.”
Enterprises are rightfully terrified of two things: their proprietary model weights being stolen (IP theft) and their sensitive data being exposed to the host hardware owner (privacy breach). You cannot build a decentralized AI economy on “trust me.”
This is why Confidential Computing is the most undervalued technology in the deep tech stack today. Confidential Computing allows data to remain encrypted while it is being processed (in use), not just when it is at rest on a hard drive or in transit over a network. It uses hardware-based “enclaves” (like Intel SGX, AMD SEV, or NVIDIA’s confidential computing modes) to creating a mathematically secure cleanroom within an otherwise untrusted machine.
Think of it as the “SSL for the AI era.” Just as e-commerce was impossible before SSL encryption allowed us to send credit card numbers securely over the open web, distributed AI is impossible without Confidential Computing. It is the boring, unsexy plumbing that will enable banks to run fraud detection models on edge nodes and hospitals to process patient data on shared infrastructure without ever exposing the raw information.
The New Infrastructure Stack
The AI trade is rapidly shifting from the “Training Phase” to the “Inference Phase,” and with it, the infrastructure stack is inverting.
The Old Stack was defined by massive centralized compute, reliance on the public grid, general-purpose giant models, and perimeter-based security. It was a philosophy of “bring the data to the compute.”
The New Stack is defined by distributed edge compute, on-prem power efficiency, specialized SLMs, and cryptographic security via Confidential Computing. Its philosophy is “bring the compute to the data.”
Founders and investors need to stop building for the infinite-resource mindset of 2023. The physical reality of 2025 is constrained, efficient, and distributed. The future of AI isn’t growing larger – it’s growing closer.