The age of the internet troll may soon come to an end, as researchers at Caltech and Stanford have laid the groundwork for an effective AI tool that social media networks could eventually use to flag online hate speech and harassment.
This is particularly important today, as many believe a rise in hateful rhetoric online has made room for frequent mass shootings and violent crimes.
Although many social media sites have already turned to AI to monitor hate speech and harassment on their platforms, existing methods are far from flawless.
In a recent study, researchers from the University of Washington found that a leading AI tool designed to pick up on hate speech had a racial bias against African Americans. They found that tweets were one-and-a-half times more likely to be labeled as “offensive” when they were written by black people. And they were about two times more likely to be labeled as “offensive” when written in what the researchers called “African American English.”
A separate study conducted by Cornell researchers revealed similar findings.
Primarily, this is because many existing AI tools rely on keywords and aren’t effective in gauging for context. The “n-word,” for example, can be either hateful or endearing depending on the demographic it is used by. So flagging something out of context, strictly based on the words used, is not an effective solution.
“These systems are being developed to identify language that’s used to target marginalized populations online,” Thomas Davidson, a doctoral candidate in sociology and lead author of the Cornell study, said in a news release. “It’s extremely concerning if the same systems are themselves discriminating against the population they’re designed to protect.”
To supplement existing AI tools, many social media sites also depend on their users to point out abusive, hateful speech. Though, it’s clearly not an effective long-term solution.
“It isn’t scalable to have humans try to do this work by hand, and those humans are potentially biased,” Maya Srikanth, a junior at Caltech and co-author of the Caltech/Stanford study, said in a news release. “On the other hand, keyword searching suffers from the speed at which online conversations evolve. New terms crop up and old terms change meaning, so a keyword that was used sincerely one day might be meant sarcastically the next.”
So instead, her team used a GloVe (Global Vectors for Word Representation) model to discover new and relevant keywords.
GloVe is a word-embedding model. Starting with one word, it can be used to find clusters of terms that are linguistically and semantically related to the original word. For example, when the researchers used the GloVe model to search Twitter for uses of “MeToo,” clusters of related hashtags popped up, including “SupportSurvivors,” “Not Silent” and “I’m With Her.” This model gives researchers the ability to track keywords and phrases as they evolve over time and lets them find new ones as they emerge.
GloVe also considers context, showing the extent to which specific keywords are related and giving input to how the words are being used. For example, in an online Reddit forum used by an anti-feminist men’s rights activists group, the model determined that the term “female” was used most commonly with words like “intercourse,” “negative” and “sexual.” But in Twitter posts about the “MeToo” movement, the term ”female” was most commonly next to words like “companies,” “desire” and “capitalist.”
Currently, the study provides proof-of-concept, proving a more powerful and effective way to spot online harassment exists. After further testing, however, the team hopes its discoveries can be used to rid the internet of trolls, making social media sites places for supportive and productive conversations rather than abusive ones.
Anima Anandkumar, a co-author of the study and the Bren Professor of Computing and Mathematical Sciences at Caltech, explained that in 2018, she found herself a victim of online harassment. “It was an eye-opening experience about just how ugly trolling can get,” she said in a news release. “Hopefully, the tools we’re developing now will help fight all kinds of harassment in the future.”