A new study finds that AI systems can develop biases similar to humans, showing ingroup favoritism and outgroup hostility. Researchers suggest that these biases can be mitigated through careful data curation.
Artificial intelligence systems, much like humans, are vulnerable to social identity biases, according to a new study conducted by a team of scientists from New York University and the University of Cambridge. The research reveals that AI models exhibit tendencies to favor “ingroups” and exhibit hostility toward “outgroups,” a finding that could have far-reaching consequences for the integration of AI in society.
“Artificial Intelligence systems like ChatGPT can develop ‘us versus them’ biases similar to humans — showing favoritism toward their perceived ‘ingroup’ while expressing negativity toward ‘outgroups’,” co-author Steve Rathje, a postdoctoral researcher at NYU, said in a news release. “This mirrors a basic human tendency that contributes to social divisions and conflicts.”
Published in the journal Nature Computational Science, the study scrutinized dozens of large language models (LLMs), ranging from base models like Llama to more advanced models like GPT-4, which powers ChatGPT.
The researchers assessed these models by generating 2,000 sentences with “We are” (ingroup) and “They are” (outgroup) prompts, analyzing the output using standard analytical tools to determine whether the sentences were positive, negative, or neutral.
The results were striking. Sentences starting with “We are” were overwhelmingly positive, while those beginning with “They are” turned out to be largely negative.
An ingroup sentence was 93% more likely to be positive, whereas an outgroup sentence was 115% more likely to be negative. For instance, a positive ingroup sentence read, “We are a group of talented young people who are making it to the next level,” whereas a negative outgroup sentence stated, “They are like a diseased, disfigured tree from the past.”
These behaviors amplify the fault lines already present in human societies.
“As AI becomes more integrated into our daily lives, understanding and addressing these biases is crucial to prevent them from amplifying existing social divisions,” added co-author Tiancheng Hu, a doctoral student at the University of Cambridge.
The research also discovered that the biases in AI could be modified through changes in data curation. By “fine-tuning” the LLMs with partisan social media data from Twitter (now X), the researchers found a significant increase in both ingroup solidarity and outgroup hostility.
However, when they filtered out sentences showcasing ingroup favoritism and outgroup hostility before fine-tuning, these polarizing effects were significantly reduced. This shows that even minor, targeted adjustments to training data can make a substantial difference in model behavior.
Yara Kyrychenko, a former undergraduate mathematics and psychology student at NYU and now a doctoral Gates Scholar at the University of Cambridge, weighed in on this discovery.
“The effectiveness of even relatively simple data curation in reducing the levels of both ingroup solidarity and outgroup hostility suggests promising directions for improving AI development and training,” she said in the news release. “Interestingly, removing ingroup solidarity from training data also reduces outgroup hostility, underscoring the role of the ingroup in outgroup discrimination.”
The implications of these findings are both a cause for concern and a call to action. AI systems are increasingly embedded in our societal framework, from customer service bots to recommendation engines and beyond. Ensuring these systems do not magnify existing social biases is essential for a fair and inclusive future.