Research from Penn State demonstrates that transparency about the racial diversity in AI training data can improve user trust and perceived fairness of AI systems, fostering more ethical use.
A new study from Penn State has found that increasing transparency about the diversity of training data used in artificial intelligence (AI) systems can significantly enhance user trust and perceived fairness. The research suggests that disclosing the racial composition of AI training data and the backgrounds of the data labelers helps users make more informed decisions about whether and how to utilize these technologies.
AI systems are ubiquitous in modern technology, from home assistants to advanced search engines and large language models like ChatGPT. While these systems may appear omniscient, their outputs are only as robust as the data they are trained on. Despite the critical role of training data, users often remain unaware of potential biases it may contain.
The research, conducted by S. Shyam Sundar, director of the Center for Socially Responsible Artificial Intelligence at Penn State, and Cheng “Chris” Chen, a former doctoral student at Penn State who is now an assistant professor of communication design at Elon University, explores the impact of revealing this information.
Understanding Algorithmic Bias
“Users may not realize that they could be perpetuating biased human decision-making by using certain AI systems,” Sundar said in a news release.
Chen, the lead author, emphasized that this bias becomes evident only after a user has completed a task, making it difficult to determine trustworthiness before use.
“This bias presents itself after the user has completed their task, meaning the harm has already been inflicted, so users don’t have enough information to decide if they trust the AI before they use it,” Chen said in the news release.
To test their hypothesis, the researchers created two experimental scenarios: one showcasing a diverse dataset and the other a non-diverse dataset.
In the diverse condition, the participants observed an equal racial distribution of training data and labelers, while the non-diverse condition displayed a significant majority from a single racial group.
The participants analyzed the performance of an AI-powered tool named HireMe, which assessed job candidates based on facial expressions and tone during automated interviews.
Findings and Implications
The study found that exposing users to diverse training data increased their trust in AI systems.
“We found that showing racial diversity in training data and labelers’ backgrounds increased users’ trust in the AI,” Chen added.
Furthermore, providing feedback opportunities enhanced participants’ sense of agency and future intention to use the AI system, he explained.
Interestingly, the study revealed that too much feedback might reduce usability, particularly for users who perceive the system as already fair and accurate. The presence of multiple diversity cues works independently, but both data diversity and labeler diversity cues are effective in shaping users’ perceptions of an AI system’s fairness.
Sundar highlighted the importance of representation in training data to avoid misinterpretation of emotions across different racial groups.
“If AI is just learning expressions labeled mostly by people of one race, the system may misinterpret emotions of other races,” he added.
Promoting Transparent and Ethical AI
The researchers underscored that for AI systems to garner user trust, the origins of their training data should be transparent.
“Making this information accessible promotes transparency and accountability of AI systems,” Sundar added. “Even if users don’t access this information, its availability signals ethical practice and fosters fairness and trust in these systems.”
As the role of AI continues to expand in various sectors, ensuring that these systems are fair, transparent and trustworthy is crucial. This study by Penn State contributes a significant step toward achieving that goal.
The full findings of this research can be found in the journal Human-Computer Interaction.