
Government agencies throughout Canada are investigating how they might adopt and deploy ‘artificial intelligence’ programs to enhance how they provide services. In the case of national security and law enforcement agencies these programs might be used to analyze and exploit datasets, surface threats, identify risky travellers, or automatically respond to criminal or threat activities.
However, the predictive software systems that are being deployed–‘artificial intelligence’–are routinely shown to be biased. These biases are serious in the commercial sphere but there, at least, it is somewhat possible for researchers to detect and surface biases. In the secretive domain of national security, however, the likelihood of bias in agencies’ software being detected or surfaced by non-government parties is considerably lower.
I know that organizations such as the Canadian Security Intelligence Agency (CSIS) have an interest in understanding how to use big data in ways that mitigate bias. The Canadian government does have a policy on the “Responsible use of artificial intelligence (AI)” and, at the municipal policing level, the Toronto Police Service has also published a policy on its use of artificial intelligence. Furthermore, the Office of the Privacy Commissioner of Canada has published a proposed regulatory framework for AI as part of potential reforms to federal privacy law.
Timnit Gebru, in conversation with Julia Angwin, suggests that there should be ‘datasheets for algorithms’ that would outline how predictive software systems have been tested for bias in different use cases prior to being deployed. Linking this to traditional circuit-based datasheets, she says (emphasis added):
As a circuit designer, you design certain components into your system, and these components are really idealized tools that you learn about in school that are always supposed to work perfectly. Of course, that’s not how they work in real life.
To account for this, there are standards that say, “You can use this component for railroads, because of x, y, and z,” and “You cannot use this component for life support systems, because it has all these qualities we’ve tested.” Before you design something into your system, you look at what’s called a datasheet for the component to inform your decision. In the world of AI, there is no information on what testing or auditing you did. You build the model and you just send it out into the world. This paper proposed that datasheets be published alongside datasets. The sheets are intended to help people make an informed decision about whether that dataset would work for a specific use case. There was also a follow-up paper called Model Cards for Model Reporting that I wrote with Meg Mitchell, my former co-lead at Google, which proposed that when you design a model, you need to specify the different tests you’ve conducted and the characteristics it has.
What I’ve realized is that when you’re in an institution, and you’re recommending that instead of hiring one person, you need five people to create the model card and the datasheet, and instead of putting out a product in a month, you should actually do it in three years, it’s not going to happen. I can write all the papers I want, but it’s just not going to happen. I’m constantly grappling with the incentive structure of this industry. We can write all the papers we want, but if we don’t change the incentives of the tech industry, nothing is going to change. That is why we need regulation.
Government is one of those areas where regulation or law can work well to discipline its behaviours, and where the relatively large volume of resources combined with a law-abiding bureaucracy might mean that formally required assessments would actually be conducted. While such assessments matter, generally, they are of particular importance where state agencies might be involved in making decisions that significantly or permanently alter the life chances of residents of Canada, visitors who are passing through our borders, or foreign national who are interacting with our government agencies.
As it stands, today, many Canadian government efforts at the federal, provincial, or municipal level seem to be signficiantly focused on how predictive software might be used or the effects it may have. These are important things to attend to! But it is just as, if not more, important for agencies to undertake baseline assessments of how and when different predictive software engines are permissible or not, as based on robust testing and evaluation of their features and flaws.
Having spoken with people at different levels of government the recurring complaint around assessing training data, and predictive software systems more generally, is that it’s hard to hire the right people for these assessment jobs on the basis that they are relatively rare and often exceedingly expensive. Thus, mid-level and senior members of government have a tendency to focus on things that government is perceived as actually able to do: figure out and track how predictive systems would be used and to what effect.
However, the regular focus on the resource-related challenges of predictive software assessment raises the very real question of whether these constraints should just compel agencies to forgo technologies on the basis of failing to determine, and assess, their prospective harms. In the firearms space, as an example, government agencies are extremely rigorous in assessing how a weapon operates to ensure that it functions precisely as meant given that the weapon might be used in life-changing scenarios. Such assessments require significant sums of money from agency budgets.
If we can make significant budgetary allocations for firearms, on the grounds they can have life-altering consequences for all involved in their use, then why can’t we do the same for predictive software systems? If anything, such allocations would compel agencies to make a strong(er) business case for testing the predictive systems in question and spur further accountability: Does the system work? At a reasonable cost? With acceptable outcomes?
Imposing cost discipline on organizations is an important way of ensuring that technologies, and other business processes, aren’t randomly adopted on the basis of externalizing their full costs. By internalizing those costs, up front, organizations may need to be much more careful in what they choose to adopt, when, and for what purpose. The outcome of this introspection and assessment would, hopefully, be that the harmful effects of predictive software systems in the national security space were mitigated and the systems which were adopted actually fulfilled the purposes they were acquired to address.