MIT Researchers Unveil Technique to Reduce Bias in AI Models, Enhancing Fairness and Accuracy

MIT researchers have introduced a new technique that reduces bias in AI models while preserving or enhancing their accuracy, demonstrating significant improvements in performance for underrepresented groups.

Researchers at the Massachusetts Institute of Technology (MIT) have unveiled a new technique that reduces bias in machine-learning models while maintaining or even enhancing their overall accuracy. This breakthrough holds the potential to mitigate the risks posed by AI systems that often underperform for underrepresented groups.

Machine-learning models tend to falter when making predictions for individuals who were underrepresented in their training datasets. For example, a model trained predominantly with data from male patients may inaccurately predict treatment outcomes for female patients.

Traditional methods that attempt to balance datasets by removing data points from overrepresented groups can severely hamper a model’s overall performance due to significant data loss.

MIT’s new approach takes a more refined stance by identifying and removing only those specific data points that most contribute to a model’s failure in predicting outcomes for minority subgroups. This precision allows for far fewer removals compared to existing methods, thereby maintaining the integrity and accuracy of the model.

“Many other algorithms that try to address this issue assume each datapoint matters as much as every other datapoint. In this paper, we are showing that assumption is not true. There are specific points in our dataset that are contributing to this bias, and we can find those data points, remove them and get better performance,” co-lead author Kimia Hamidieh, an electrical engineering and computer science (EECS) graduate student at MIT, said in a news release.

The team outlined their method in a paper and will showcase their research at the Conference on Neural Information Processing Systems.

Contextualizing the Breakthrough

The drive for this new technique stems from the understanding that some data points in a massive training dataset could disproportionately damage a model’s accuracy on specific downstream tasks. Previous work by the MIT researchers introduced a method known as TRAK, which identifies the most crucial training examples for a given model output.

In the current study, the team used incorrect predictions made by the model about minority subgroups and applied TRAK to pinpoint which training examples were responsible for those errors.

“By aggregating this information across bad test predictions in the right way, we are able to find the specific parts of the training that are driving worst-group accuracy down overall,” co-author Andrew Ilyas, a former doctoral student at MIT and now a Stein Fellow at Stanford University, said in the news release.

The team then removed these specific samples and retrained the model.

The results demonstrated remarkable improvements in worst-group accuracy while removing far fewer training samples than conventional balancing methods. This makes the new technique more accessible and simpler to apply across different models, even when biases are not labeled.

The researchers tested their method across three machine-learning datasets, continually finding it outperforms existing techniques. One notable instance saw their approach improve worst-group accuracy while removing about 20,000 fewer training samples than traditional data-balancing methods.

“This is a tool anyone can use when they are training a machine-learning model. They can look at those datapoints and see whether they are aligned with the capability they are trying to teach the model,” Hamidieh added.

The technique’s potential to uncover hidden sources of bias in unlabeled training datasets is particularly compelling. It promises future applications in high-stakes scenarios, such as healthcare, to ensure AI models are fair and reliable.

“When you have tools that let you critically look at the data and figure out which datapoints are going to lead to bias or other undesirable behavior, it gives you a first step toward building models that are going to be more fair and more reliable,” Ilyas added.