
When OpenAI launched ChatGPT in 2022, investors rushed in, imagining another software gold rush. Startups like OpenAI and Anthropic quickly reached sky-high valuations. The logic seemed sound, AI was software. Like Salesforce or Microsoft, these tools promised global scale with minimal costs per user.
But this belief is now falling apart. Building and running large language models involves expensive hardware, constant power demands, and infrastructure like massive data centers. The AI boom, it turns out, is infrastructure pretending to be software.
Why Every AI Query Costs More Than You Think
Training AI models already costs hundreds of millions. But generating answers, called inference, is not cheap either. Each user query spins up thousands of chips and burns significant energy.
Mid-sized models run on two to four chips. Advanced outputs may use eight or more. Generating a million-token response can take 30 to 40 minutes and cost $40 to $55 using Nvidia’s H100 chips. This isn’t a one-time fee. Every query repeats this cost.
Power consumption adds to the problem. One H100 chip draws around 700 watts. For AI startups serving billions of tokens daily, annual electricity bills run into the millions. When cooling systems and inefficiencies are factored in, the burden doubles.
The Global Race for Compute Is Heating Up
Meeting global demand for inference will need a $3.7 trillion data center investment, according to McKinsey. These centers will consume 733 terawatt-hours of electricity, enough to power 68 million homes.
Big tech firms know this. That’s why Amazon, Google, and Microsoft are locking in power through long-term contracts and developing their own chips. Owning the full stack, chips, energy, and cloud, lets them reduce compute costs and protect margins. Many already report gross margins near 70%.
Smaller players like Anthropic or Perplexity struggle in comparison. They often rent infrastructure and offer steep discounts to attract users. Each query eats into profits, especially when served through cloud platforms.
Falling Token Prices Are Squeezing Margins
Since 2022, token prices have dropped up to 280-fold. This price war aims to capture early users, but it undermines profitability. Stanford researchers say larger clients now pay half the listed token price due to volume deals.
The economics are simple: every query costs money, and each dollar earned is shrinking. For firms that don’t own infrastructure, profits disappear quickly.
OpenAI and Anthropic sell directly to enterprises to stay afloat. Higher prices help cover their massive compute costs. However, most usage still flows through cloud platforms, where margins collapse.
Big Tech Doubles Down on Infrastructure
Microsoft, despite its partnership with OpenAI, is investing in its own AI data centers. Even OpenAI is developing custom chips and building dedicated AI infrastructure. The logic is clear, control costs or lose the game.
Optimists argue that better engineering will save the day. Developers now optimize code to reuse calculations, reduce power, and serve multiple users at once. Nvidia’s new Blackwell chips promise double the efficiency of current models.
Yet, these improvements often fuel more demand. Bigger, smarter models require more compute, cancelling out gains.
Future Profits May Depend on Smarter Business Models
Some believe pricing models must change. AI should not waste resources on low-value tasks like writing tweets. Instead, it should focus on reviewing code or analyzing contracts, services users will pay for.
OpenAI is already branching out. It’s working on a browser, a payment system, and enterprise consulting. These moves aim to reduce reliance on AI chat alone and embed OpenAI deeper into business workflows.
Still, giants like Amazon and Google hold the upper hand. Their massive scale, chip design, and energy contracts give them control over AI infrastructure costs. They can reinvest profits and expand faster than startups.
Betting on AI Requires Betting on Infrastructure
Despite the excitement, history warns us. Railroads and telecoms once promised big returns but burned investors when demand fell short. AI could follow a similar path.
Ubiquity does not guarantee profit. Even the biggest AI firms are gambling billions on a future where AI queries make more money than they cost. If that future doesn’t arrive fast enough, only the firms with full-stack control may survive.