Let’s Understand the sources of Bias in AI

Shambhavi Siddhida
Jul 14, 2022
4 min read

Summary—Bias in the Training Data—Feedback Loop & Correlation—Data Manipulation Attacks

The term bias implies “cause to feel or show inclination or prejudice for or against someone or something.” There are types of biases on the basis of which human beings make decisions and judgments. The term isn’t inherently negative. Some biases are positive and helpful, they may even help us survive tough situations or make quicker decisions but some biases can be implicit. These are cognitive shortcuts that are not conscious and may result in favoritism for or against a particular person/group/entity. Paying attention to helpful biases, while keeping-–negative, prejudicial, or accidental biases in check requires a delicate balance between self-protection and empathy for others.

Today, Many institutions make decisions based on artificial intelligence (AI) systems using machine learning (ML), whereby a series of algorithms takes and learns from massive amounts of data to find patterns and make predictions. These systems have the ability to take important decisions in our lives and impact them. The possibility that humans have the tendency to be implicitly biased also implies that it is very easy for the existing bias in our society to be transferred to algorithms. Let’s try to understand the types of Biases that seep into these algorithms, and see some real-life cases of algorithmic bias across corporations and organizations.

Sources of Bias in AI

Bias in the Training Data

The data set that is used to create an AI model can be categorized into two types—Training data & testing data. The training data teaches the model, what the expected output looks like. The model examines the dataset over and over to profoundly grasp its qualities and change itself for better execution. Test data is utilized to assess the performance or precision of the model. It is used to make an unbiased evaluation of the final model. A Series of algorithms take and learn from massive amounts of data to find patterns and make predictions. If the training data for an AI model is biased, it can lean towards specific things and disregard others, leading to prejudiced assumptions. There can be different types of biases in the training data resulting in a biased AI Model. It can be anything from a sampling bias (a statistical problem where some members of a population are systematically more likely to be selected in a sample than others) or a case of temporal bias wherein the machine-learning model works well at a point in time but fails in the future. This is because the makers collect the data in an unbalanced manner with regards to the time of collecting it. This leads to overlooking possible future changes when building the model. An example of data-driven bias is amazon’s AI models, which were prepared to vet candidates by noticing patterns in resumes submitted to the organization for more than a 10-year time span. Most came from men, an impression of male strength across the tech industry. Basically, Amazon's framework instructed itself that male competitors were ideal. It punished resumes that incorporated the word “women”.

Influencing the Data by the Model

A Positive Feedback Loop & Correlation

AI Algorithms can also influence the data that they get after making predictions. The data that gets added to an AI system after it makes a prejudiced prediction can develop a positive feedback loop reinforcing the past trend of discriminatory decisions. For instance, let’s take the COMPAS algorithm used in U.S courts for assessing a person’s risk of recidivism. The COMPAS algorithm was trained using the criminal records present within the justice system wherein the practices of policing and sentencing are already explicitly/implicitly biased against black people. Based on its training data, it predicted that a black person was more likely to re-offend than a white person with the same other background factors. In this case, since the outcome will be a biased prediction, it will lead to biased decisions that again become part of the training data of the model, reinforcing the decisions taken in past.

Correlation

Additionally, this brings us to how these systems make predictions based on correlation, without understanding the context of a particular data. However, correlation doesn’t always equal causation. For example, let’s say a company rejects a few candidates from a particular college not because they are incompetent but due to a hiring freeze as they want to cut costs. However, the fact that they weren’t hired gets added to the training data of their AI system that’s used for getting hiring recommendations. The AI would start to correlate that college with bad candidates and potentially stop recommending candidates from there even if they are of great potential because it doesn’t know the causation of why they weren’t selected.

Data Manipulation Attacks

Data manipulation attacks can be defined in two ways. One is when a hacker gains unauthorized access to computers/systems/networks not for the purpose of stealing it, but instead to make covert changes to data for fulfilling a motive. A hypothetical example can be, if an attacker managed to alter critical data in a hospital, it could result in harm to and even loss of human lives. The second way to view such attacks is when they aren’t performed by a hacker, but they are still an organized effort of people to mess with the training data. Such an incident occurred back in 2016, with Microsoft's chatbot Tay. We’ve explored more on this in one of our previous interview articles with an AI Ethicist. Be sure to check it out here!

To what extent should we let AI decide for us? Should it act as a recommendation system or a sole decision-maker in certain areas? One thing is certain, before making any AI system act as a decision-maker, there is an urgent need for corporate organizations to be more proactive in ensuring fairness and non-discrimination as they leverage AI to improve productivity and performance. The AI models are as good as the humans who make them. Their biased predictions are also reflective of the biases and prejudices of our society. Want to make a change in the field of AI as a woman? WAI is a do-tank that works with more than 30,000 community members to improve the visibility of women in the field of STEM and AI. Click here, to get involved today!

Let’s Understand the sources of Bias in AI

Related Posts

Comments