October 25, 2021

What is Machine Bias

Read time Min.

Machine Learning

What is machine learning? Well, it’s deployed in a lot of the technology we use every day. Machine learning helps us get from place to place, gives us suggestions to translate stuff and it even understands what we say to it. How does it work? Traditional programming involves people hand-coding the solution to a problem step by step, while machine learning uses computers to find patterns in data to reach the solution.

This idea of teaching robots English actually comes from cognitive science and its understanding of how children learn language. They learn meanings by interacting with people and copying their behavior. How frequently two words appear together is the first clue we get to deciphering their meaning. 

To achieve this, developers have used a common machine learning program to crawl through the internet, look at 840 billion words, and teach itself the definitions of those words. The program accomplishes this by looking at how often certain words appear in the same sentence. Take the word “bottle.” The computer understands what the word means by noticing it occurs more frequently alongside the word “container,” and also near words that connote liquids like “water” or “milk.” 

It’s easy to think in a system like this everything is objective, and there’s no human bias. However, just because something is based on data doesn’t automatically make it neutral. Machine bias exists because these algorithms have been created by biased humans, trained with biased historical data and refined with biased human feedback.

How does machine bias happen?

Since machines have no emotions, they must be objective, right? Not exactly. There has been a lot of recent talk about of the issues with algorithms. The ethics of AI has hit the news as more and more cases of questionable uses come to light. As a result, machine bias has come into greater focus.  

Biased datasets

One example of bias coming from datasets can be seen in how algorithms interpret gender roles. Machine learning programs will pick up on the fact that most nurses throughout history have been women. They’ll realize most computer programmers are male. Researchers from Boston University and Microsoft showed that software trained on text collected from Google News reproduced gender biases well documented in humans. When they asked software to complete the statement “Man is to computer programmer as woman is to X,” it replied, “homemaker.” 

“People pictured in kitchens, for example, became even more likely to be labeled ‘woman’ than reflected the training data.” ​

Ordoñez, V. et al. (2017) University of Washington

This example shows how our historical data shows distortions that come from our own human biases as a society. As sophisticated machine-learning programs proliferate, such distortions matter. The researchers’ paper includes a photo of a man at a stove labeled “woman.” 

Biased creators

Even with the best intentions, it’s impossible to separate ourselves from our own human biases, so our human biases become part of the technology we create. 

“Computers learn how to be racist, sexist, and prejudiced in a similar way that a child does, from their creators.”

Aylin Caliskan, Princeton Computer Scientist

This is a big reason, one of many, why the diversity problem in silicon valley is so important to acknowledge and correct. When the people creating these technologies, training them and computing their output share a same background, they are likely to perpetuate systemic biases and oppression through blind spots in their own world view. In the same way able-bodied people often forget to design for people with disabilities or how cisgender developers miss being inclusive of different personal pronouns, when we train and program the calculations the algorithms make, we might miss the bigger picture and diverse experiences.

Biased Peers

User feedback is important in the training of algorithms, it helps refine and train the associations and solutions the algorithm provides as a variety of people use it in real life. But this feedback can be a double edged sword. A couple of years ago a Harvard study found that when someone searched in Google for a name normally associated with a person of African-American descent, an ad for a company that finds criminal records was more likely to turn up.  The algorithm may initially have done this for both black and white people, but over time the biases of the people who did the search probably got factored in, says Christian Sandvig, a professor at the University of Michigan’s School of Information. 

Princeton computer scientist Aylin Caliskan found that as a machine teaches itself English, it becomes prejudiced against black Americans and women. Like a child, a computer builds its vocabulary through how often terms appear together. On the internet, African-American names are more likely to be surrounded by words that connote unpleasantness. That’s not because African Americans are unpleasant. It’s because people on the internet say awful things. And it leaves an impression on an AI in training.

Machine bias in action

If all this sounds a bit too abstract to you, don’t worry I’ve been there too. When I found this little example of machine bias in action it was very obvious and I could see it in place in my day to day life. The experiment suggested I use my regular texting app and predictive text: first I would type “The pilot said…” and see what predictive text would suggest as the next word, then I would type “The nurse said…” and see what predictive text showed as the next word. You can see the my screen captures here and then try it on your own phone. What did you find?

Video Description: Two iPhone screens showing the message interface while I type. On the left-hand side I’m typing “the pilot said” and the predictive text suggests the next word to be “he”, on the right-hand side I’m typing “the nurse said” and the predictive text suggests the next word to be “she”.