Mitigating AI-Based Harms in National Security

Photo by Pixabay on Pexels.com

Government agencies throughout Canada are investigating how they might adopt and deploy ‘artificial intelligence’ programs to enhance how they provide services. In the case of national security and law enforcement agencies these programs might be used to analyze and exploit datasets, surface threats, identify risky travellers, or automatically respond to criminal or threat activities.

However, the predictive software systems that are being deployed–‘artificial intelligence’–are routinely shown to be biased. These biases are serious in the commercial sphere but there, at least, it is somewhat possible for researchers to detect and surface biases. In the secretive domain of national security, however, the likelihood of bias in agencies’ software being detected or surfaced by non-government parties is considerably lower.

I know that organizations such as the Canadian Security Intelligence Agency (CSIS) have an interest in understanding how to use big data in ways that mitigate bias. The Canadian government does have a policy on the “Responsible use of artificial intelligence (AI)” and, at the municipal policing level, the Toronto Police Service has also published a policy on its use of artificial intelligence. Furthermore, the Office of the Privacy Commissioner of Canada has published a proposed regulatory framework for AI as part of potential reforms to federal privacy law.

Timnit Gebru, in conversation with Julia Angwin, suggests that there should be ‘datasheets for algorithms’ that would outline how predictive software systems have been tested for bias in different use cases prior to being deployed. Linking this to traditional circuit-based datasheets, she says (emphasis added):

As a circuit designer, you design certain components into your system, and these components are really idealized tools that you learn about in school that are always supposed to work perfectly. Of course, that’s not how they work in real life.

To account for this, there are standards that say, “You can use this component for railroads, because of x, y, and z,” and “You cannot use this component for life support systems, because it has all these qualities we’ve tested.” Before you design something into your system, you look at what’s called a datasheet for the component to inform your decision. In the world of AI, there is no information on what testing or auditing you did. You build the model and you just send it out into the world. This paper proposed that datasheets be published alongside datasets. The sheets are intended to help people make an informed decision about whether that dataset would work for a specific use case. There was also a follow-up paper called Model Cards for Model Reporting that I wrote with Meg Mitchell, my former co-lead at Google, which proposed that when you design a model, you need to specify the different tests you’ve conducted and the characteristics it has.

What I’ve realized is that when you’re in an institution, and you’re recommending that instead of hiring one person, you need five people to create the model card and the datasheet, and instead of putting out a product in a month, you should actually do it in three years, it’s not going to happen. I can write all the papers I want, but it’s just not going to happen. I’m constantly grappling with the incentive structure of this industry. We can write all the papers we want, but if we don’t change the incentives of the tech industry, nothing is going to change. That is why we need regulation.

Government is one of those areas where regulation or law can work well to discipline its behaviours, and where the relatively large volume of resources combined with a law-abiding bureaucracy might mean that formally required assessments would actually be conducted. While such assessments matter, generally, they are of particular importance where state agencies might be involved in making decisions that significantly or permanently alter the life chances of residents of Canada, visitors who are passing through our borders, or foreign national who are interacting with our government agencies.

As it stands, today, many Canadian government efforts at the federal, provincial, or municipal level seem to be signficiantly focused on how predictive software might be used or the effects it may have. These are important things to attend to! But it is just as, if not more, important for agencies to undertake baseline assessments of how and when different predictive software engines are permissible or not, as based on robust testing and evaluation of their features and flaws.

Having spoken with people at different levels of government the recurring complaint around assessing training data, and predictive software systems more generally, is that it’s hard to hire the right people for these assessment jobs on the basis that they are relatively rare and often exceedingly expensive. Thus, mid-level and senior members of government have a tendency to focus on things that government is perceived as actually able to do: figure out and track how predictive systems would be used and to what effect.

However, the regular focus on the resource-related challenges of predictive software assessment raises the very real question of whether these constraints should just compel agencies to forgo technologies on the basis of failing to determine, and assess, their prospective harms. In the firearms space, as an example, government agencies are extremely rigorous in assessing how a weapon operates to ensure that it functions precisely as meant given that the weapon might be used in life-changing scenarios. Such assessments require significant sums of money from agency budgets.

If we can make significant budgetary allocations for firearms, on the grounds they can have life-altering consequences for all involved in their use, then why can’t we do the same for predictive software systems? If anything, such allocations would compel agencies to make a strong(er) business case for testing the predictive systems in question and spur further accountability: Does the system work? At a reasonable cost? With acceptable outcomes?

Imposing cost discipline on organizations is an important way of ensuring that technologies, and other business processes, aren’t randomly adopted on the basis of externalizing their full costs. By internalizing those costs, up front, organizations may need to be much more careful in what they choose to adopt, when, and for what purpose. The outcome of this introspection and assessment would, hopefully, be that the harmful effects of predictive software systems in the national security space were mitigated and the systems which were adopted actually fulfilled the purposes they were acquired to address.

Quote

How we measure changes not only what is being measured but also the moral scaffolding that compels us to live toward those standards. Innovations like assembly-line factories would further extend this demand that human beings work at the same relentlessly monotonous rate of a machine, as immortalized in Charlie Chaplin’s film Modern Times. Today, the control creep of self-tracking technologies into workplaces and institutions follows a similar path. In a “smart” or “AI-driven” workplace, the productive worker is someone who emits the desired kind of data — and does so in an inhumanly consistent way.


Sun-ha Hong, “Control Creep: When the Data Always Travels, So Do the Harms
Link

Facebook Prioritizes Growth Over Social Responsibility

Karen Hao writing at MIT Technology Review:

But testing algorithms for fairness is still largely optional at Facebook. None of the teams that work directly on Facebook’s news feed, ad service, or other products are required to do it. Pay incentives are still tied to engagement and growth metrics. And while there are guidelines about which fairness definition to use in any given situation, they aren’t enforced.

The Fairness Flow documentation, which the Responsible AI team wrote later, includes a case study on how to use the tool in such a situation. When deciding whether a misinformation model is fair with respect to political ideology, the team wrote, “fairness” does not mean the model should affect conservative and liberal users equally. If conservatives are posting a greater fraction of misinformation, as judged by public consensus, then the model should flag a greater fraction of conservative content. If liberals are posting more misinformation, it should flag their content more often too.

But members of Kaplan’s team followed exactly the opposite approach: they took “fairness” to mean that these models should not affect conservatives more than liberals. When a model did so, they would stop its deployment and demand a change. Once, they blocked a medical-misinformation detector that had noticeably reduced the reach of anti-vaccine campaigns, the former researcher told me. They told the researchers that the model could not be deployed until the team fixed this discrepancy. But that effectively made the model meaningless. “There’s no point, then,” the researcher says. A model modified in that way “would have literally no impact on the actual problem” of misinformation.

[Kaplan’s] claims about political bias also weakened a proposal to edit the ranking models for the news feed that Facebook’s data scientists believed would strengthen the platform against the manipulation tactics Russia had used during the 2016 US election.

The whole thing with ethics is that they have to be integrated such that they underlie everything that an organization does; they cannot function as public relations add ons. Sadly at Facebook the only ethic is growth at all costs, the social implications be damned.

When someone or some organization is responsible for causing significant civil unrest, deaths, or genocide then we expect that those who are even partly responsible to be called to account, not just in the public domain but in courts of law and international justice. And when those someones happen to be leading executives for one of the biggest companies in the world the solution isn’t to berate them in Congressional hearings and hear their weak apologies, but to take real action against them and their companies.

Link

Links for November 23-December 4, 2020

  • When AI sees a man, it thinks “official.” a woman? “smile”| “The AI services generally saw things human reviewers could also see in the photos. But they tended to notice different things about women and men, with women much more likely to be characterized by their appearance. Women lawmakers were often tagged with “girl” and “beauty.” The services had a tendency not to see women at all, failing to detect them more often than they failed to see men.” // Studies like this help to reveal the bias baked deep into the algorithms that are meant to be ‘impartial’, with this impartiality truly constituting a mathwashing of existent biases that are pernicious to 50% of society.
  • The ungentle joy of spider sex | “Spectacular though all this is, extreme sexual size dimorphism is rare even in spiders. “It’s an aberration,” Kuntner says. Even so, as he and Coddington describe in the Annual Review of Entomology , close examination of the evolutionary history of spiders indicates that eSSD has evolved at least 16 times, and in one major group some lineages have repeatedly lost it and regained it. The phenomenon is so intriguing it’s kept evolutionary biologists busy for decades. How and why did something so weird evolve?” // This is a truly wild, and detailed, discussion of the characteristics of spider evolution and intercourse.
  • Miley Cyrus-Plastic Hearts // Consider me shocked, but I’m really liking Cyrus’ newest album.
Link

The implausibility of intelligence explosion

The intelligence of the AIs we build today is hyper specialized in extremely narrow tasks — like playing Go, or classifying images into 10,000 known categories. The intelligence of an octopus is specialized in the problem of being an octopus. The intelligence of a human is specialized in the problem of being human.

What would happen if we were to put a freshly-created human brain in the body of an octopus, and let in live at the bottom of the ocean? Would it even learn to use its eight-legged body? Would it survive past a few days? We cannot perform this experiment, but we do know that cognitive development in humans and animals is driven by hardcoded, innate dynamics.

Chollet’s long-form consideration of the ‘intelligence explosion’ is exactly the long, deep dive assessments of artificial intelligence I wish we had more of. In particular, his appreciation for the relationship between ‘intelligence’ and ‘mind’ and ‘socio-situationality’ struck me as meaningful and helpful, insofar as it recognizes the philosophical dimensions of intelligence that is often disregarded, forgotten about, or simply not appreciated by those who talk generally about strong AI systems.