Home > Media News > Speech Recognition Algorithms Struggle with Black Users’ Voices

Speech Recognition Algorithms Struggle with Black Users’ Voices
25 Mar, 2020 / 04:47 pm / omnes

751 Views

Stanford  University researchers found that speech recognition algorithms struggle with black speakers. The new study shows that speech recognition systems have more trouble understanding black users’ voices than those of white users.

The researchers used voice recognition tools from Apple, Amazon, Google, IBM, and Microsoft to transcribe interviews with 42 white people and 73 black people, all of which took place in the US. The tools misidentified words about 19 percent of the time during the interviews with white people and 35 percent of the time during the interviews with black people. The system found 2 percent of audio snippets from white people to be unreadable, compared to 20 percent of those from black people. The errors were particularly large for black men, with an error rate of 41 percent compared to 30 percent for black women.

A study conducted by MIT  found that an Amazon facial recognition service made no mistakes when identifying the gender of men with light skin, but performed worse when identifying an individual’s gender if they were female or had darker skin.

In the Stanford study, Microsoft’s system achieved the best result, while Apple’s performed the worst. It’s important to note that these aren’t necessarily the tools used to build Cortana and Siri, though they may be governed by similar company practices and philosophies.

“Fairness is one of our core AI principles, and we’re committed to making progress in this area,” said a Google spokesperson in a statement to The Verge . “We’ve been working on the challenge of accurately recognizing variations of speech for several years, and will continue to do so.”

The Stanford paper posts that the racial gap is likely the product of bias in the datasets that train the system. Recognition algorithms learn by analyzing large amounts of data; a bot trained mostly with audio clips from white people may have difficulty transcribing a more diverse set of user voices.