How much money it takes to train an LLM ?

Wiseland AI Engineering Team's photo

3 min read

Cover Image for How much money it takes to train an LLM ?

GPT-1 was trained at a negligible cost. GPT-2, on the other hand, cost $40,000 to train, a sum generally beyond what a hobbyist programmer might be able to afford. However, GPT-3 took over $100 million to train, which is nearly 100 times larger than the cost of GPT-2.

GPT-4 took things to a different level, and I am told it cost around $1.5 billion to train. However, this number remains a closely guarded secret.

By that logic, GPT-5 should cost around $100 billion to train.

Where does the money go ?

Rarely in software you see this level of scale in expenditure. Even for startups that are seeing explosive growth, you wont see them spend $100M on first iteration of the product and then jump to $2B directly.

However, for LLMs most of the training effort goes not into people but hardware and electricity. Training such massive models requires thousands of machines running continuously in specialized data centers.

A lot of companies have understood this and companies like Apple, Google, Meta etc. have worked on building their own hardware and even microchips that can assist in this work instead of relying on Generic hardware.

The future looks like ?

The next battle of AI models will be about keeping these costs manageable. Any company that would bet $50B on training a model is taking existential risk. Either you have an exceptional model or you have something that is only marginally better than the one you trained with $2B.

So in this arms race of AI models the company that has deep pockets and who can use less money to achieve more computing power is likely to do better. If you build a model that is 10x better than your competition you can quickly capture the market.

I expect Google, Microsoft, Apple etc. to heavily invest in alternative cheaper energy, more efficient and superior hardware and data centers in cooler climates.

Chances are 1-10% of all the computers ever built will be put to use for this training in near future.

Where does it end ?

When Sam Altman claimed that he needs $7T dollars for an AI future he was not exaggerating. He might need that much money if he were to build say ChatGPT-6. But then why not ChatGPT-7 or 8 ? Where does this arms race end ?

You can not milk a cow beyond a certain limit. We have made tremendous progress on deep learning and other techniques but it is the availability of hardware that has made this AI revolution possible.

In our attempts to train models cheaply we will see drastically new techniques of training models. In some case it might involve training with the help of previous models. In some sense, we will try to invent the technological singularity.

The idea of traditional model might reach its end pretty soon than we expect.

Unpredictable future

Future I feel is highly unpredictable. The growth is going to be unprecedented. But what remains to be seen is what limits can we push with existing understanding of AI models.