30x More Efficient Than OpenAI's GPT-4!

At the end of January, the little-known Chinese startup DeepSeek found itself in the spotlight of global media. Modest investments of $5.6 million in the development of a new model resulted in a devastating blow to the market — American tech giants collectively lost nearly $1 trillion in capitalization.

Nasdaq Index on January 27. Source: Bloomberg

The emergence of an accessible alternative to ChatGPT, claiming to be the “Silicon Valley killer,” has caused a real stir in the industry.

The Rise of DeepSeek

DeepSeek began its independent journey in May 2023 in Hangzhou, the capital of Zhejiang province. This city is considered China’s largest e-commerce hub, home to the headquarters of giants like Alibaba Group, Geely, Hikvision, and Ant Group.

Behind the project is Liang Wenfeng — a businessman and co-founder of the hedge fund High-Flyer, which manages assets worth $8 billion. Founded in 2015, the company has long shown interest in machine learning, investing significant resources in creating its own computing infrastructure as well as research in artificial intelligence. DeepSeek emerged from this structure.

In 2020, High-Flyer introduced the Fire-Flyer I supercomputer, costing 200 million yuan ($27.6 million), specializing in deep learning for AI. A year later, Fire-Flyer II was launched a system costing 1 billion yuan ($138 million), equipped with over 10,000 Nvidia A100 GPUs.

DeepSeek’s debut model, released in November 2023, immediately demonstrated performance on par with GPT-4 and was made available for free for researchers and commercial use.

By May 2024, DeepSeek-V2 was launched, with the company’s competitive pricing policy forcing even giants like ByteDance, Tencent, Baidu, and Alibaba to lower their prices for AI solutions. As a result, DeepSeek managed to maintain profitability while its competitors incurred losses.

In December 2024, the DeepSeek-V3 model was introduced, outperforming the latest developments from OpenAI and Anthropic in tests. Based on this model, the company created DeepSeek-R1 and its derivatives, which formed the basis of the much-talked-about service.

Performance comparison of DeepSeek models with OpenAI models in different tests. Source: DeepSeek.

The main advantage of the new model is its unprecedented low cost of use. For processing one million tokens, DeepSeek charges only $2.19, while OpenAI charges $60 for a similar volume.

Behind the Breakthrough: The Structure of DeepSeek-R1

According to a published study, DeepSeek-R1 is based on reinforcement learning methods and “cold start” techniques. This has allowed it to achieve exceptional performance in areas such as mathematical calculations, programming, and logical reasoning.

A key feature of the model is the Chain of Thought approach that allows complex tasks to be broken down into sequential steps, mimicking human thinking. The system analyzes a task, divides it into stages, and checks each step for errors before forming a final answer.

The technical implementation impresses with its efficiency. DeepSeek-R1 was trained on a system of 2048 Nvidia H800 accelerators, consuming approximately 2.788 million GPU hours. Process optimization is achieved through mixed precision FP8 and Multi-Token Prediction technology, significantly reducing hardware requirements.

The model architecture includes 671 billion parameters; however, uniquely only 37 billion are activated during a single pass. The use of Mixture of Experts ensures scalability without proportional increases in computational costs.An innovative method called Group Relative Policy Optimization (GRPO) deserves special attention.

It allows training models without using critics, significantly enhancing process efficiency.As noted by Jim Fan, senior research manager at Nvidia, this resembles Google DeepMind’s AlphaZero breakthrough that learned to play Go and chess “without prior imitation of human grandmaster moves.”

He stated that this is “the most important takeaway from the research paper.”

A New Approach to Training Language Models

DeepSeek’s approach to training is particularly interesting. Unlike other leading LLMs, R1 did not undergo traditional “pre-training” on human-labeled data. Researchers found a way for the model to develop its own reasoning abilities almost from scratch.

“Instead of explicitly teaching the model how to solve problems, we simply provide it with the right incentives and it autonomously develops advanced strategies,” states the research.

The model also represents a new paradigm in AI development: rather than simply scaling up computing power for training, emphasis is placed on how much time and resources the model spends contemplating an answer before generating it. This scaling “computations at test time” distinguishes this new class of “reasoning models” like DeepSeek R1 and OpenAI-o1 from their predecessors.

Telegram’s CEO Reaction to the sucess of DeepSeek’s Model

In a congratulatory message for the Chinese New Year, Telegram founder Pavel Durov highlighted the success of the buzzworthy AI model DeepSeek and identified the reasons behind such a breakthrough.

“China’s progress in algorithmic efficiency didn’t arise from nowhere. Chinese students have long outperformed others in mathematics and programming at international Olympiads,” he noted.

According to him, China’s education system surpasses that of the West. It encourages fierce competition among students, a principle “borrowed from the highly efficient Soviet model.”

In most Western schools, public announcements of grades and student rankings are prohibited to prevent pressure and ridicule. Durov believes such measures demotivate the best students.

“Victory and defeat are two sides of the same coin. Eliminate the losers, and you eliminate the winners,” he emphasized.

As a result, many gifted children find competitive games more engaging than studying—there they see each player’s ranking.

Praising students regardless of their performance may seem like a good thing, but reality will shatter this illusion after graduation

“AI benchmarks demonstrating DeepSeek’s superiority are one such public ranking, and they are increasing. If the U.S. secondary education system does not undergo radical reform, China’s growing dominance in technology seems inevitable,” Durov concluded.

Critical Perspective on DeepSeek’s Breakthrough

DeepSeek’s success raises many questions within the professional community. Scale AI CEO Alexander Wang claims that the company possesses 50,000 Nvidia H100 chips, which directly contradicts U.S. export restrictions.

“As far as I understand, DeepSeek has installed 50 thousand H100s […]. They cannot talk about them [publicly] because it contradicts U.S. export control,” Wang said.

Given that after restrictions were imposed, the price of smuggled H100s in China soared to $23,000–30,000 each; such a cluster would cost between $1–1.5 billion.

Analysts at Bernstein question the claimed training cost of model V3 at $5.6 million and note a lack of data regarding R1’s development expenses. According to Peel Hunt expert Damindu Jayavira, public figures only reflect GPU-hour costs while ignoring other significant expenses.

“It was trained in less than 3 million GPU hours which corresponds to a training cost just over $5 million. In comparison, analysts estimate that training Meta’s latest large AI model cost $60–70 million,” Jayavira stated.

Political aspects also raise concerns. Founder Liang Wenfeng’s participation in a closed symposium chaired by Chinese Premier Li Qiang may indicate a strategic role for the company in overcoming export restrictions and achieving technological independence for China.

“There is a high likelihood that DeepSeek and many other large Chinese companies are supported by the Chinese government not just financially,” stated Edward Harris, CTO of Gladstone AI closely collaborating with the U.S. government.

It should also be noted that there are built-in censorship mechanisms in R1’s API version—especially concerning politically sensitive topics for China. The model refuses to discuss events at Tiananmen Square or human rights issues in China or Taiwan’s status replacing generated responses with standard evasive phrases.Concerns about data privacy are also significant.

DeepSeek Privacy Policy Excerpt. Source: DeepSeek.

According to DeepSeek’s policy, users’ personal information is stored on servers in China which could create problems similar to those faced by TikTok especially acute in the American market where regulators have already shown increased scrutiny toward Chinese tech companies regarding personal data protection.

The Future of Language Models After DeepSeek

Despite controversies surrounding it, DeepSeek’s achievements cannot be underestimated. Testing results show that R1 indeed surpasses American counterparts across many parameters. As Alexander Wang noted: this is “a wake-up call for America,” demanding accelerated innovation and enhanced export control over critical components.

While OpenAI still maintains industry leadership for now; DeepSeek’s emergence significantly alters power dynamics within AI models and infrastructure markets. If official figures are accurate; this Chinese company has managed to create a competitive solution with substantially lower costs through innovative approaches and optimization questioning strategies focused solely on increasing computational power adopted by many market players.

Interest in DeepSeek technologies is growing: Meta has already established four “war rooms” to analyze Chinese models aiming to apply acquired knowledge toward developing its open-source Llama ecosystem.

Some experts see DeepSeek’s success not so much as a threat to U.S. technological dominance but rather as an indication of forming a multipolar world in AI development. As former OpenAI policy department employee Miles Brundage stated:

“China will have its own superintelligence no more than a year after the U.S., unless war breaks out—so if you don’t want (literally) war you need to have a vision for how to navigate multipolar outcomes in AI development.”

It seems we are witnessing the beginning of a new era in artificial intelligence development where efficiency and optimization may prove more important than sheer computational power.

Source: https://coinpaper.com/7219/deep-seek-has-disrupted-the-market-why-chinese-ai-turned-out-to-be-30-times-more-efficient-than-gpt-4