DeepSeek AI Development Costs $1.6 Billion, Debunking Affordability Myth
The new chatbot from DeepSeek introduced itself with a captivating promise: "Hi, I was created so you can ask anything and get an answer that might even surprise you." This statement encapsulates the essence of what DeepSeek aims to achieve in the competitive AI market, which recently saw one of NVIDIA's largest stock price drops due to DeepSeek's impact.
Image: ensigame.com
DeepSeek's AI model stands out due to its innovative architecture and training methods. Here are the key technologies that differentiate it:
Multi-token Prediction (MTP): Unlike traditional models that predict one word at a time, DeepSeek's MTP forecasts multiple words simultaneously by analyzing different parts of a sentence. This method not only boosts accuracy but also enhances the model's efficiency.
Mixture of Experts (MoE): DeepSeek V3 employs an MoE architecture, utilizing 256 neural networks, with eight activated for each token processing task. This approach accelerates AI training and significantly improves performance.
Multi-head Latent Attention (MLA): MLA focuses on the most significant parts of a sentence by repeatedly extracting key details from text fragments. This reduces the chance of missing crucial information, allowing the AI to capture important nuances effectively.
Despite claiming to have developed a competitive AI model with a minimal budget of $6 million for training DeepSeek V3 using just 2048 graphics processors, further investigation reveals a more complex picture.
Image: ensigame.com
Analysts from SemiAnalysis discovered that DeepSeek operates a vast computational infrastructure, comprising around 50,000 Nvidia Hopper GPUs. This includes 10,000 H800 units, another 10,000 H100s, and additional H20 GPUs, spread across multiple data centers for AI training, research, and financial modeling. The total investment in servers is approximately $1.6 billion, with operational expenses estimated at $944 million.
DeepSeek is a subsidiary of the Chinese hedge fund High-Flyer, which spun off the startup in 2023 to focus on AI technologies. Unlike many startups that rely on cloud providers, DeepSeek owns its data centers, giving it full control over AI model optimization and enabling rapid innovation. The company is self-funded, which enhances its flexibility and decision-making speed.
Image: ensigame.com
DeepSeek also attracts top talent, with some researchers earning over $1.3 million annually, primarily from leading Chinese universities. The company's claim of training its latest model for just $6 million seems unrealistic when considering the broader context. This figure only accounts for GPU usage during pre-training and excludes research expenses, model refinement, data processing, and overall infrastructure costs.
Since its inception, DeepSeek has invested over $500 million in AI development. Its compact structure allows for active and effective implementation of AI innovations, unlike larger, more bureaucratic companies.
Image: ensigame.com
DeepSeek's journey illustrates that a well-funded independent AI company can indeed compete with industry giants. However, experts note that its success is due to substantial investments, technical breakthroughs, and a strong team, rather than a "revolutionary budget" for AI development. Despite this, DeepSeek's costs remain significantly lower than those of its competitors. For example, while DeepSeek spent $5 million on R1, ChatGPT4o's training cost $100 million.
However, it’s still cheaper than its competitors.
Latest Articles