Summary: This is a discussion of social injustice, real or perceived, promulgated or perpetuated by machine learning models. We propose a simple solution based on wide spread misunderstanding of what ML models can do.
This is a discussion of social injustice, real or perceived, promulgated or perpetuated by machine learning models. Yes there are many different types of bias, some technically subtle and some overt. If you’d like a deeper dive try this article on the topic we wrote two years ago. But unless you are living on another planet you, data scientist, are expected to know that this is a uniquely social justice issue and that you just aren’t doing enough about it.
- Biased AI Perpetuates Racial Injustice
- AI Experts Say Research Into Algorithms That Claim To Predict Criminality Must End
- Federal Study Of Top Facial Recognition Algorithms Finds ‘Empirical Evidence’ Of Bias – Error Rates Were Affected By Ethnicity, Age, And Gender
- The Age of Secrecy and Unfairness in Recidivism Prediction
- A New Study Found Many Clinical Algorithms Are Still Subjected To Racial Biases
- AI Researchers Say Scientific Publishers Help Perpetuate Racist Algorithms
- California City Bans Predictive Policing, Algorithms Can Reinforce Racial Disparities In Policing, Critics Say
So we know that there are ML models in use that do in fact discriminate against certain populations in these high impact situations like who gets bail, who is incarcerated, who gets a loan, insurance, a house, or a job.
Many of these occur in highly regulated industries where modeling is restricted from using certain types of information, typically race, gender, sex, age, and religion. This extends to data which could be proxies for these variables like geography.
Modeling techniques are also restricted to completely transparent and explainable simple techniques like GLM and simple decision trees. And state agencies like the Department of Insurance can and do question the variables chosen in modeling.
Even so, the models have too many false positives or false negatives that turn out to disadvantage one or more minorities. This seems to be especially true for those used in the unregulated public sector.
In the past we would have said the solution is simply more data to include representative populations of these minorities. But this isn’t actually the answer.
Before giving you the answer, let me ask you a question. If you encountered a data set like the one at right, and had the wisdom to explore and segment before modeling, what might you have concluded?
It appears that some of the blues might behave as reds and some of the reds might behave as blues, but the best course of action would be to divide the project into one model for each segment.
At very least, if the predictive features for the two are the same, then their weights are probably quite different. It may also be that different features are highly predictive between the two sets.
Behold a Unicorn: A Single Model that Predicts Both
There is no such thing as a unicorn. Building a single model to predict both these segments is asking machine learning models to do something they are not designed to do.
Properly designed ML models are designed to replicate the classification found in the training sets. If the training sets are sufficiently ‘good’ then the models can be used with measurable confidence to predict outcomes in the near future.
Since predicting the outcome for different sets (black, Hispanic, or Caucasian) is the objective, use different training sets showing who reoffends, who is rearrested, who defaults, and allow the model to differentiate based on race, gender, or any other social construct that otherwise fails when the model is blinded.
Two Models? Isn’t that a Double Standard?
We are taught from our earliest civics and ethics education that particularly in America, double standards are not allowed. If all men (and women) are created equal then they must be judged by the same standards.
But having two models is not the same as having different standards. The common standard by which they are separately judged is who in each group is likely to reoffend, be rearrested, default or whatever the goal is to be.
The models should be considered fair and impartial if the type 1 and type 2 errors for each are similar.
There will be those want to burrow into the issues of causality and that’s an interesting question but not one that machine learning is completely ready to answer. Despite advances in this field, there is still too much work to do. Besides, do we really care if the reasons for the behavior are the same? We are only trying to predict the outcome.
It may require taking much of the legal and administrative apparatus in this country back to school on what ML can and cannot do. And that’s admittedly no small task. But using two or three or more models that address racial, gender, or other social disparities is a quick fix. Why haven’t we already done it?
Other articles by Bill Vorhies.
About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2.1 million times.
[email protected] or [email protected]