Share this article on:
Chinese artificial intelligence (AI) company DeepSeek has restricted new registrations following a cyber attack.
The company’s new AI assistant and large language model (LLM), R1, was launched earlier this month, quickly reaching the top spot for free applications on the Apple App Store in the US.
Now, DeepSeek has announced that it has suffered major cyber attacks on its services.
“Due to large-scale malicious attacks on DeepSeek’s services, we are temporarily limiting registrations to ensure continued service. Existing users can log in as usual. Thanks for your understanding and support,” it said.
While the company has not shared any details of the cyber attacks, media reports suggest that its API and Web Chat are suffering from distributed denial-of-service (DDoS) attacks.
The attack followed a recent outage affecting its user login and its application programming interface (API), but there is no indication that this was the result of a cyber incident.
However, much of the attention DeepSeek faces is not regarding the recent news of a cyber attack, but rather the impact it has had on the existing AI market.
What is DeepSeek?
DeepSeek’s new AI assistant is powered by the DeepSeek-V3 model, which the company said “tops the leaderboard among open-source models and rivals the most advanced closed-source models globally”.
The Chinese firm’s success has come as a shock to US AI and technology firms, largely related to the way in which the AI was developed.
For context, the administration of former US president Joe Biden introduced a number of bans to restrict the export of advanced chips to China, which are used to train these AI models, in an effort to limit AI training in China and keep the US ahead of the race.
As a result, previous attempts at advanced generative AI models from China such as the first Chinese LLM from Chinese search engine Baidu, were underwhelming and unable to compete with the US market.
However, DeepSeek said its V3 model was trained using NVIDIA H800 GPUs, an older, less powerful chip that is used by OpenAI.
The H800 is a modified NVIDIA H100, the chip typically used by US AI developers for its superior power. The H800 has a lower chip-to-chip transfer rate, among other changes, and has been modified for export to China, allowing NVIDIA to keep the market without giving China the power to compete.
Not only has DeepSeek managed to train its AI model on the inferior chip, it’s also been able to do it for cheap. The company claims that V3 was trained on 2,788 thousand H800 GPU hours. At a rate of US$2 per GPU hour, the total comes out to only US$5.58 million.
Admittedly, the costs are only for V3’s final training run, but they still represent a “win” over US technology giants.
The key to keeping the costs down and using inferior chips is in the model’s clever design. While AI assistants like GPT 3.5 activate the entire model during training and answering queries, DeepSeek only activates necessary parts of the model to complete tasks.
Launched in the V2 model is DeepSeekMoE, with the “MoE” referring to “mixture of experts”, allowing the model to be split into a number of “experts”, which can be activated as necessary.
OpenAI’s GPT-4 also makes use of the expert model.
Additionally, DeepSeek launched DeepSeekMLA, where “MLA” refers to “multi-head latent attention”, which reduces the amount of memory required during inference by compressing the key-value store.
While these two innovations were launched with the V2 model, it was with the launch of V3 that also reduced communications overhead with new load balancing and added multi-token prediction for training, that allowed DeepSeek to develop and train the new model so cheaply.
Now, with the launch of R1 and R1-Zero earlier this month, the company is directly competing with OpenAI’s 01, which was originally the only existing reasoning model on the market, and a large part of why OpenAI was considered to be on top.
DeepSeek’s impact on the current market
The Chinese company has begun to change the narrative on how much AI investment is needed for the development of new AI models, a stark contrast to last year when OpenAI CEO Sam Altman asked for US$7 trillion to fund the AI GPU revolution.
Arguably, DeepSeek’s biggest win over OpenAI is the fact that its AI model is open source, allowing companies to develop their own AI models more easily and cheaply.
In fact, DeepSeek’s new model has already caused a major shake-up of the existing market. NVIDIA’s stock price fell almost 18 per cent yesterday, resulting in a US$600 billion (roughly A$1 trillion) market cap drop, the largest in Wall Street history.
Additionally, former Intel CEO Pat Gelsinger has commended DeepSeek for creating a lower-cost model with performance rivalling OpenAI.
“Wisdom is learning the lessons we thought we already knew. DeepSeek reminds us of three important learnings from computing history,” said Gelsinger.
“1) Computing obeys the gas law. Making it dramatically cheaper will expand the market for it. The markets are getting it wrong, this will make AI much more broadly deployed.
“2) Engineering is about constraints. The Chinese engineers had limited resources, and they had to find creative solutions.
“3) Open Wins. DeepSeek will help reset the increasingly closed world of foundational AI model work. Thank you DeepSeek team.”
Gelsinger also said that his start-up, Gloo, is already making use of DeepSeek R1.
“My Gloo engineers are running R1 today,” he said. “They could’ve run o1 – well, they can only access o1, through the APIs,” he said.
With DeepSeek, Gloo said it will rebuild its Kallm AI from scratch “with our own foundational model that’s all open source” in two weeks, rather than paying for and using OpenAI.
Furthermore, when asked if he would be excited with DeepSeek if he was still the CEO of Intel, Gelsinger answered “yes”.
A glass cannon: DeepSeek’s security issues
Despite its appeal, it seems DeepSeek still has its flaws, most notably with security. Cyber threat intelligence firm KELA said that while DeepSeek’s new model may be able to compete with OpenAI, and even outperform it in some scenarios, it is still lacking when it comes to security.
“KELA has observed that while DeepSeek R1 bears similarities to ChatGPT, it is significantly more vulnerable,” the firm said.
“KELA’s AI Red Team was able to jailbreak the model across a wide range of scenarios, enabling it to generate malicious outputs, such as ransomware development, fabrication of sensitive content, and detailed instructions for creating toxins and explosive devices.”
Similarly, a cyber security expert at NordVPN, Adrianus Warmenhoven, warns of the privacy concerns a Chinese AI model presents.
“DeepSeek, being a Chinese AI start-up, operates within a regulatory environment where government oversight of data is stringent. This raises potential risks concerning data collection, storage, and usage,” said Warmenhoven.
“Users need to be aware that any data shared with the platform could be subject to government access under China’s cyber security laws, which mandate that companies provide access to data upon request by authorities.”
“Another key concern lies in the lack of transparency that often surrounds how AI models are trained and how they operate. Users should consider whether their interactions or uploaded data might inadvertently contribute to machine learning processes, potentially leading to data misuse or the development of tools that could be exploited maliciously.”
“Moreover, there is always the risk of cyber attacks. As AI platforms become more sophisticated, they also become prime targets for hackers looking to exploit user data or the AI itself. With the rise of deepfakes and other AI-driven tools, the stakes are higher than ever.”