DeepSeek recently made waves in the technology world with their new AI model, R1. This model showcases a reasoning capability comparable to OpenAI’s o1, but with a notable distinction: DeepSeek claims that their model was trained at a significantly lower cost.
While there has been debate around whether DeepSeek is the real deal or a DeepFake, it’s clear that this is a wake-up call — the path of ever-larger LLMs that rely on ever-increasing GPUs and massive amounts of energy is not the only path forward. In fact, it’s become obvious there is limited advantage to that approach, for a few reasons:
First, pure scaling of LLMs at training time has reached the point of diminishing returns or perhaps even near-zero returns. Bigger models trained with more data are not resulting in meaningful improvements.
Further, enterprises don’t need massive, ask-me-anything LLMs for most use cases. Even prior to DeepSeek, there’s a noticeable shift towards smaller, more specialized models tailored to specific business needs. As more enterprise AI use cases emerge, it becomes more about inference — actually running the models to drive value. In many cases, that will happen at the edge of the internet, close to end users. Smaller models that are optimized to run on widely available hardware will create more value long-term than over-sized LLMs.
Finally, the LLM space is entering an era of optimization. The AI models we have seen so far have focused on innovation by scaling at any cost. Efficiency, specialization, and resource optimization are once again taking center stage, a signal that AI’s future isn’t about brute force alone, but in how strategically and efficiently that power is deployed.
DeepSeek highlights this point very well in their technical papers, which showcase a tour de force of engineering optimization. Their advancements include modifications to the transformer architecture and techniques to optimize resource allocation during training. While these innovations move the field forward, these are incremental steps toward progress rather than a radical revolution of AI technology.
And while the media is making a big deal about their advancements — which are indeed noteworthy — they are generally missing a key point: if DeepSeek hadn’t done this, someone else would have. And they are likely only the first in what will be a new wave of AI that leverages significant efficiency gains in both model training costs and size.
It’s important that we put DeepSeek’s accomplishments in context. The company’s advancements are the latest step in a steady march that has been advancing the state of the art in LLM architecture and training for years. This is not a disruptive breakthrough. While the news was a wake-up call for many, it should have been expected by those paying close attention to industry trends. The reality is that in the two years since OpenAI trained GPT-4, the state of the art in training efficiency has advanced considerably. And it’s not just about hardware (GPUs); it’s about algorithms and software. So it should be no surprise a company – even a company like DeepSeek that does not have access to the latest and greatest GPUs – can now train models that are as good as GPT-4 at a much lower cost.
DeepSeek deserves credit for taking this step and for disclosing it so thoroughly, but it’s just another expected milestone in the technical evolution of AI that will be followed by many more.
Comments