This article is a good complement to the previous post, providing some pragmatic rigour on the risk of bias in machine learning and ways of countering it. Perhaps the most important point is one of the simplest:
It is safe to assume that bias exists in all data. The question is how to identify it and remove it from the model.
There is some good practical advice on how to do just that. But there is an obvious corollary: if human bias is endemic in data, it risks being no less endemic in attempts to remove it. That’s not a counsel of despair, this is an area where good intentions really do count for something. But it does underline the importance of being alert to the opposite, that unless it is clear that bias has been thought about and countered, the probability is high that it still remains. And of course it will be hard to calibrate the residual risk, whatever its level might be, particularly for the individual on the receiving end of the computer saying ‘no’.