Back in high school in Banff, in the early 1980s, I had a math and computer science teacher named John P. Stutz. He was quite the character (still is; between then and now, he has been mayor of Banff, and now he performs weddings for a living). One of his lessons I remember best is the classic computer science axiom: GIGO – garbage in, garbage out. In other words, no matter how good a program is, if you feed it garbage data, you will get garbage results. The fact that it’s a well-designed program on a good computer doesn’t automatically transmute rubbish into gold.
Back then, machine learning and artificial intelligence were still in early days. The sophistication I now get from my iPhone – or Google, for that matter – would have blown my head clean off back then. Stutz’s too, probably. But these days, you can seriously propose and produce things that would have been pie in the sky at the time: say, having a computer read job applications and automatically filter candidates, or getting writing advice from a computer application that had learned from a large number of published articles, or having software use security cameras to scan faces and flag people whose appearance matched faces with a higher-than-average likelihood of interaction with police. And when you train that kind of thing, GIGO matters. But something else matters even more: BIBBO.
BIBBO? That stands for bias in, bigger bias out.
Why not just BIBO? Well, here’s the thing. Garbage is garbage, but bias is scalable, and repeated bias compounds. When computers learn biased things from biased data and then put those biased things in the real world, that has real-world effects that feed back and increases the bias in the model. And also, if bias is going in, it’s because that bias can be found in the real world, and the machine’s biased output will confirm and strengthen that real-world bias. Not just the machine but the people who use the machine will have bigger bias.
Consider the job application filtering program I mentioned. The machine learning will look and see that certain kinds of applicants have been more likely to get hired, and will filter those kinds of applicants in and other applicants out. Seems fine? Not if the hiring choices in the original set were influenced by factors such as race, gender, religion, prestige address, prestige school etc. For most jobs, none of these factors have a direct bearing on ability to do the job. But the machine will see a certain kind of name and address and so on and downgrade the applicant on the basis of other similar people not getting hired previously. And then as such applicants are hired less and less, the machine’s bias is reinforced – as is the bias of those doing the hiring.
Consider the writing advice idea. An application that has looked at thousands of academic journal articles will have identified various stylistic features of academic writing. What it won’t know is that many of those features are actually functionally bad: they obscure key details and bury essential points in circumlocution and uncommon terms. Many editors are trying to work to undo these practices, but it’s an uphill struggle, as academic authors often think that if text sounds too clear it’s not erudite enough, and if it takes into account the author’s specific role and position it’s not objective. Throw in software that reinforces these habits and you get academic authors having these prejudices reinforced and being told to do more of what the editors want them to do less of.
And consider the security cameras. If the data from the cameras is used to advise police on who to do a stop-and-check with, and each stop-and-check is counted as an interaction with the police, then obviously it produces a feedback effect. If in one month people with red hair happen to have interactions with police at twice the rate of people with brown hair – statistical anomalies do happen – and that data is fed back into the system, in the next month people with red hair may be more likely to be stopped and checked, which will increase their interaction statistics. And even if the system only counts actual arrests, not interactions, anything that increases the likelihood of someone being stopped by police also increases the likelihood of their being arrested, assuming they have the same likelihood as anyone else of happening to be doing or possessing something illegal, and the same likelihood of not responding well to being stopped by police for no obvious reason. And then the system becomes more biased, and so may the police officers – and perhaps society in general.
These aren’t made-up examples, either. They’re taken from the real world, from products and applications promoted by software companies. I’m not naming names just because it’s tiring (and occasionally expensive) to deal with angry techbros.
There are ways to correct for these biases, of course. You can work on evening out the training data; you can correct for biases in the data and the output. Above all, though, you need to know what to watch out for, and how to deal with it. You need to know BIBBO. Because if there’s bias in the system, bias is the system.
It will naturally help if you yourself, as a designer of the machine learning, do not also have uninspected and uncorrected biases. A problem we face today with many of these applications is the idea that if it’s large amounts of real-world data processed by sophisticated programs, then it is objective and not subject to human biases. This is false, of course – it’s programmed by humans and the data is taken from humans in a society with its own biases – but there are people in the field who do not seem to see that it is false, because they have an ideology that science (including math and computer science) is hard and strong and intelligent and objective, while things that study humans – sociology, philosophy, etc. – are soft and weak and wishy-washy and tendentious. They come through education with this bias, and they use it to filter the information they get, and they design computer applications with that bias. And so you get these things that reinforce bias. All because they thought they could avoid bias by avoiding inspecting bias. But BIBBO.