Lessons from DeepSeek
Happy February! January was fast and furious, and now it’s a blur. It’s hard to believe that one-twelfth of the year is over. While we were all locking down our 2025 plans in January, the most significant technological development came from DeepSeek.
Last week, we witnessed DeepSeek’s mobile app shoot to the top of Apple’s app store, various tech CEOs answered questions about DeepSeek during their earnings calls, defensive comments were made about whether DeepSeek circumvented export controls or stole IP, and shares of chipmakers and other AI companies lost up to 17% of their market cap. Indeed, Silicon Valley went into a tizzy last week and went through various stages of grief.
However, the denial phase of grief started way earlier than last week. DeepSeek V2 was released in May 2024 as an alternative to the Llama 3 70B parameter model with comparable performance. DeepSeek V3 was released in December 2024, improving performance and allowing three times faster inference times, sending chills throughout the AI community. Many benchmarks showcase DeepSeek V3’s outperformance, but if you believe programming has sizable economic value, DeepSeek V3 outperforms Claude 3.5 Sonnet on Codeforces. Claims that training DeepSeek V3 costs around $5.58 million over two months garnered the most denial and mental gymnastics. There were clear signs of DeepSeek’s rise before the release of its R1 model on January 20, 2025.
In one week, Silicon Valley went through the other phases of grief. We became angry that DeepSeek must have circumvented US export controls, stolen IP, and used distillation to train its models. We bargained with ourselves and justified that the $6m training cost couldn’t be accurate. We became depressed when R1 was released, which was even better than V3, we sold off public companies with AI exposure, and $1T of market cap evaporated. We accepted the results by the end of the week and downloaded DeepSeek’s models and mobile apps.
Fareed Zakaria wrote a Washington Post opinion column stating that DeepSeek has created a 21st-century Sputnik moment. Whether you agree with the headline or not, there are several important lessons.
First, many “shocking” events are entirely predictable rather than black swans. Had we been paying attention, we would have known DeepSeek’s impressive rate of change would lead to extraordinary results. The beautiful thing is that consensus around new ideas forms gradually. If you had been paying attention and had gained conviction earlier, you could have positioned yourself and your company ahead of the news hitting critical mass.
Second, step-function and order-of-magnitude improvements are often categorically dismissed as cheating or impossible without consideration. There may have been some circumvention of export laws, some bending of terms of service, or the training cost might have been higher than $6m, but there was also some real innovation. What DeepSeek has achieved is impressive, whether the training costs were $6m, $60m, or even $600m. Perhaps DeepSeek just beat us, so let’s learn from them.
Third, constraints breed creativity, which had been forgotten throughout the AI community. As Fareed Zakaria wrote, “constraints, such as export controls, may have turbocharged innovation…Forced to use second-tier chips, Chinese engineers produced creative workarounds.” When we struggle with innovation, we should frame our constraints as challenges to invent a complete new solution. Rather than adding resources and removing obstacles, try the opposite and cut budgets, impose a shorter deadline, or set more demanding performance objectives.
DeepSeek’s innovation is good for startups. Models are becoming more capable and can be produced more cheaply, which will allow us to build more creative applications on top of these increasingly capable models. The future is undoubtedly bright, and we are only limited by our imaginations. While true, let’s not forget the lessons above.