Bias is a tendency to prefer one person or thing over another, and to then favour that person or thing. Bias matters because anyone who is on the receiving end of negative bias, bias against them, is being treated unfairly. There are numerous examples of how bias can creep into our machine learning (ML) models, and a few examples are provided below. As professionals we should do what we can to ensure that our work both helps people and doesn’t harm people either. Unfortunately, there will always be trade-offs made, a reduction in privacy for an increase in security say, but that trade-off should be made intelligently with the consent of the people being affected.
Although it’s a mathematical impossibility to create a completely fair machine learning (ML) model, there are several strategies that we can employ to reduce bias in our models. These are:
- Understand that bias exists and how it gets into our models. Bias may exist in our training data, within our team members, within our process, and in how people use the models that we create. Although we know the potential sources of bias, we may not be able to immediately (if ever) identify what those biases are. But we can and must try.
- Address bias in the source data. There are many ways that bias can exist in our training data, and strategies exist to address each challenge. One such challenge is minority group bias, typically caused by “gappy data sets”, where some groups are underrepresented in the data. One such example was the Amazon recruitment system where women were underrepresented amongst their previous hires, so their new machine learning-based system that was trained on that data continued to favour men over women as a result. One approach to solve this problem is to rebalance the data with synthetic data that addresses where we are missing examples, in this case of Amazon female employees. Another strategy would be to diversify our sampling to try to fix the bias, although the more data, the more variations, the better to train our model with albeit at a higher cost. Another challenge is the existence of sensitive/protected attributes – such as gender, race or country of birth – in the source data. During training our system could discover relations in the data related to these sensitive attributes and thus bring any corresponding biases into the model. One solution is to debias the feature space by locating the bias in the data as we are training the model, and then updating or removing data as appropriate.
- Build an interdisciplinary team. The wider the range of experience and backgrounds of the people on our development team, the greater the chance that we will be able to detect and understand any biases in our model(s).
- Follow a robust development process. We must make an effort to understand how the model will be used, perhaps via common design strategies such as personas and usage-driven modeling. The UK A-level grading fiasco very likely could have benefitted from doing this as it seemed to me that the team produced a solution that didn’t take the needs of the individual students into account. In short, it’s not just about the data. Furthermore, add checks, such as reviews by external experts, into the process to explicitly consider issues around potential biases and ethical lapses.
- Rework the model to counteract known biases. When we discover biases in the model, we must consider updating the model to counteract the biases. For example, had Amazon discovered the gender bias in their model sooner they may have been able to tweak it to improve the rating of female candidates.
- Evolve the model based on feedback from applying it in practice. We mustn’t assume that the model is perfect. We must accept that there may be biases in it that are are not yet aware about, so we must be vigilant. The act of rebalancing a model after training, based on our new knowledge of where biases are, is often called “fairness through awareness”. Furthermore, we must recognize that the model only knows what it has seen so far, and is likely to run into new inputs that it doesn’t know how to handle. As a result, we will want to continue training the model to help it improve its output.
- Change the way that people use the output of the model. I believe a significant challenge with machine learning is that people tend to place too much faith in AI-based systems. Instead of designing systems to replace people in the decision process instead design them to help the people to make better decisions. Provide the people with better information, enabling them to make more informed decisions.
I have many other writings about the practical implications of artificial intelligence (AI) that you may find interesting.