Energy consumption and AI

6 min read

Cover Image for Energy consumption and AI

Artificial intelligence (AI) has quickly become one of the most game-changing technologies today. It's used in everything from self-driving cars and facial recognition to medical diagnosis and language translation. But, this tech revolution has a hidden downside: a huge and increasing energy footprint.

Training burns calories !

AI promises to revolutionize our lives, becoming an integral part of our daily routines. However, this technological surge comes with a significant energy footprint. While using AI applications might seem innocuous, the real energy consumption lies in their creation.

Data is the fuel for AI. Massive datasets are collected and stored in sprawling, always-on distributed systems. This data then feeds powerful, energy-hungry hardware that trains colossal neural networks.

Precise figures remain unknown, but consider this: ChatGPT-4's training is estimated to have consumed approximately 60 GWh of energy. To put that in perspective, it's roughly the equivalent of a small city's monthly consumption. And remember, this is just for a single model version.

Each subsequent AI model often grows exponentially in size. ChatGPT-5, for example, may potentially consume up to 600 GWh during training. Keep in mind, major tech companies like Google, Anthropic, Midjourney, and countless others are continuously developing their own unique AI models.

Assuming there are around 50 of these major models globally, the combined energy consumption for training alone could be within the range of 30,000 - 50,000 GWh. This staggering figure underscores the urgency of finding more sustainable approaches to AI development.

Cheaper models and better models

The training data and cost calculations are not public so we can only make an estimation. But when OpenAI chief Sam Altman said that he is seeking 7 trillion dollars, he was not completely exaggerating. The cost of each ChatGPT model has increased by 30x with each version. It is estimated that the cost of chatGPT4 was around 2B, GPT5 is expected to cost 70B and GPT6 could be around 3.5T.

The training cost of models today is primarily the cost of hardware involved and the energy that goes into keeping this hardware running. Compared to this the cost of people involved in the research is much more negligible.

The future models are not just going to be bigger but they are going to be trained on data that is going to be much bigger. Large Language Models as the name suggests are training on language which is represented as text for human beings. Diffusion models can generate images and they are trained on images. A picture might worth thousand words is an apt adage because an image is indeed significantly larger in storage size compared to text. This makes the training data for image models much larger than text data.

Multi-Modal-Models are models that can be trained on wide variety of data such as text, images, audio, video and real world sensory inputs. Such models can play games and do more open ended tasks for you though we are in much early stages of such models. The training data for such models is expected to be so huge that it will completely dwarf the current training data for LLMs.

Innovation in one area has spillover effects that are only sped up by market competition. Apple, Google had long discovered this and had designed their own proprietary hardware that does training for lower costs. Rest of the world relied on Nvidia's chips. Nvidia has invested heavily and released new chips that are going to be 100x more effective than their previous version.

💡
Nvidia’s must-have H100 AI chip made it a multitrillion-dollar company, one that may be worth more than Alphabet and Amazon, and competitors have been fighting to catch up. But perhaps Nvidia is about to extend its lead — with the new Blackwell B200 GPU and GB200 “superchip.”

We expect massive innovation over next few years to reduce the training costs as well as energy consumption. But in addition to that you will see the world also focusing on cheaper and more energy.

Companies that can train faster, cheaper and on larger data are going to win the AI race of technology and in the process are going to contribute heavily to the world's energy infrastructure.

The race for cheaper energy

Two main strategies will greatly reduce costs in AI development:

  • Algorithmic Efficiency and Pre-trained Models: Rather than training AI models entirely from scratch, companies will increasingly leverage existing models and adapt them through incremental training. This approach is already underway, and we anticipate algorithmic breakthroughs that will make it even more efficient.

  • Advances in Sustainable Energy: The emergence of companies specializing in affordable, sustainable energy production will be transformative. Let's break down why energy usage is so high in AI:

The Two-Fold Energy Challenge

  1. Hardware Power Consumption Each AI chip contains millions of transistors. Electrical signals passing through these generate heat, even with minimal resistance. This massive heat output poses a significant energy cost.

  2. Cooling Demands To prevent chip damage, sophisticated cooling systems are essential. These systems themselves consume substantial energy.

Cooling uses more energy than actual hardware

While both power consumption for hardware and cooling are significant contributors to a data center's energy costs, cooling often edges out the cost of powering the hardware itself. Here's why:

  • Inefficiency of Cooling: Traditional air-cooling systems lose a considerable amount of energy in moving and conditioning air. They often fight against the natural tendency of hot air to rise, leading to wasted effort.

  • Hardware Advances: Over time, servers and other hardware have generally become more power-efficient. Their power consumption per unit of work done has decreased.

  • Overprovisioning: Data centers often overprovision cooling capacity to ensure redundancy and prevent equipment failure. This leads to continuous energy use, even when not strictly necessary.

Approximate Cost Breakdown:

  • Cooling: Data centers can spend between 30% to 50% of their energy budget on cooling, with the average hovering around 40%.

  • Hardware: The cost of powering the actual servers, storage, and networking equipment usually falls within a similar range.

Changing Trends:

Innovative technologies are changing this equation somewhat:

  • Liquid Cooling: Direct liquid cooling and immersion cooling systems are far more efficient at removing heat, reducing energy used for cooling.

  • Specialized Chips: AI-specific chips are increasingly power efficient, lowering the relative cost of running the hardware.

💡
AI training need not happen at a specific geo location, it can happen where cooling costs are lower such as northern Canada, Greenland, Antartica or even space.

AI race will make energy cheaper, abundant and cleaner

While there have been many controversies about the energy footprint and its supposed impact on the climate, we now have a compelling reason to concentrate on "energy generation." The largest and most innovative companies globally are now going to invest more in energy research, which is expected to produce better results in the coming years.

This will significantly benefit other sectors as cheaper energy will benefit everyone, making life easier across the board.