Common Method to Test AI Language Model Leaks Could Be Flawed, Researchers Find

A new study reveals that a widely used method for assessing data leaks in large language models may be flawed, raising concerns over AI data privacy.

Researchers from the University of Virginia School of Engineering and Applied Science and the University of Washington have unveiled a significant flaw in a common method used to test for potential data leaks in large language models (LLMs).

The team highlighted these findings in a paper published for peer review on July 10 and presented at the Conference on Language Modeling at the University of Pennsylvania last month.

The study challenges the reliability of membership inference attacks (MIAs), a primary tool for measuring information exposure risks in AI systems.

“We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains,” stated the paper’s abstract, revealing a critical vulnerability in current AI practices.

MIAs are supposed to act as a privacy audit, determining how much information a model is leaking about specific training data.

“It is a way to measure how much information the model is leaking about specific training data,” co-author David Evans, a professor of computer science who runs the Security Research Group at UVA, said in a news release.

However, the study’s results suggest these methods fail to accurately measure inference risks.

When creating LLMs, developers collect vast amounts of text and images from the internet and other sources, making security of this training data paramount. Inferences drawn from generated content could expose private data, proving problematic for developers and users alike.

The team assessed five common MIAs using the open-source data set “the Pile,” which encompasses diverse collections of text data from 22 sources like Wikipedia and PubMed. The research revealed that the dynamic nature of language data complicates the definition of what qualifies as a member of a training set.

“We found that the current methods for conducting membership inference attacks on LLMs are not actually measuring membership inference well, since they suffer from difficulty defining a good representative set of non-member candidates for the experiments,” Evans added.

This fluidity in language leads to challenges in identifying true data leaks. As the paper states, past research might have mistakenly demonstrated distribution inference rather than accurate membership inference due to shifts in data distribution.

The research team made their findings accessible through a project called MIMIR, advocating for more rigorous and accurate assessments of LLMs’ privacy risks. Although current evidence suggests low inference risks for individual records in pre-training data, the interactive nature of open-source LLMs presents new challenges.

Co-first author Anshuman Suri, a former UVA doctoral student who is now a postdoctoral researcher at Northeastern University, pointed out that fine-tuning existing LLMs with new data increases susceptibility to errors.

“We do know, however, that if an adversary uses existing LLMs to train on their own data, known as fine-tuning, their own data is way more susceptible to error than the data seen during the model’s original training phase,” Suri said in the news release.

The researchers emphasize the need for better methods to evaluate AI systems’ privacy concerns, a challenge to be tackled by the broader AI community in the years to come.