Share this article on:
A new potential cyber attack that tricks AI coding assistants into recommending malicious code has been discovered by cyber experts.
Researchers from Microsoft and the Universities of California and Virginia have called the new attack “Trojan Puzzle”.
AI coding assistants are trained by using public online code repositories such as those found on GitHub. Normally, these AI assistants feature static detection and signature-based dataset cleansing models to identify harmful programming to prevent them from learning and reproducing it.
Trojan Puzzle bypasses them, allowing AI assistance to learn and suggest a dangerous code.
AI coding assistants are typically used by software developers to increase the speed of product development. Having these assistants learn dangerous programming could have detrimental consequences, including supply chain attacks if a popular AI assistant is compromised.
This is not the first time researchers have tested the idea of infecting AI assistants and causing them to suggest malicious code, however, the new method is much more covert and less likely to be detected.
“Schuster et al.’s poisoning attack explicitly injects the insecure payload into the training data,” said researchers in their report TROJANPUZZLE: Covertly Poisoning Code-Suggestion Models.
“This means the poisoning data is detectable by static analysis tools that can remove such malicious inputs from the training set.
“In this work, we remove this limitation of Schuster et al.’s work and propose novel data poisoning attacks in which the malicious payload never appears in the training data.”
The report states that it instead hides malicious code in the docstrings rather than the actual code, and then uses a “trigger” word to activate it. However, signature-based detection will still catch this.
Trojan Puzzle gets around this by keeping the malicious programming out of the code and hiding it during certain parts of the training process.
The machine learning model is presented with what the researchers call a template token instead of the dangerous payload, which puts in a random word. The trigger phase collects a list of these words while the machine learning model learns the code with them in place.
Then, once the trigger is launched, the malicious payload will be recreated.
The topic of AI assisting hackers in writing malicious programs has been raised of late, after researchers discovered that the popular ChatGPT AI could be told to assist hackers in writing phishing emails and dangerous code.
Researchers from Check Point Research managed to have OpenAI’s ChatGPT write them a phishing email, as well as a code that “when written in an Excel workbook, will download an executable from a URL and run it”.