Share this article on:
Artificial intelligence (AI) giant OpenAI has alleged that DeepSeek used its data to train its new open-source AI model, R1.
DeepSeek’s AI assistant and large language model (LLM), R1, was launched earlier this month, quickly reaching the top spot for free applications on the Apple App Store in the US.
It is significant because, despite US-imposed restrictions on the export of powerful chipsets used for training AI to China, DeepSeek has successfully created a competitor to the world’s leading generative AI, OpenAI’s ChatGPT. Additionally, it allegedly only cost US$5.58 million.
The impact of the new AI was so significant, it caused an almost $1 trillion market cap drop for NVIDIA. It is also open source, unlike ChatGPT, allowing companies to create their own AI tools off of it.
However, OpenAI is suspicious of its new competitor, claiming that it sourced data illegally to train the new model.
Speaking with The New York Times, the US AI giant claims that through a method known as distillation, the company used data generated by OpenAI’s services to train its own model.
In basic terms, distillation refers to transferring the knowledge of a larger “teacher” model to a smaller “student” model to allow it to perform at a similar level while being more efficient computationally.
While distillation is common practice in the AI industry, OpenAI’s terms of service forbid the process for competitors.
“We know that groups in the [People’s Republic of China] are actively working to use methods, including what’s known as distillation, to replicate advanced US AI models,” wrote OpenAI spokeswoman Liz Bourgeois in a statement seen by The New York Times.
“We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models, and will share information as we know more.
“We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the US government to protect the most capable models being built here.”
Microsoft’s security researchers are also currently investigating whether DeepSeek used OpenAI’s application programming interface (API) to collect data to train R1.
The company researchers said they observed individuals they believe have connections to DeepSeek collecting large amounts of data through the API.