Categories
Links

LSE Study Exposes AI Bias in Social Care

A new study from the London School of Economics highlights how AI systems can reinforce existing inequalities when used for high risk activities like social care.

Writing in The Guardian, Jessica Murray describes how Google’s Gemma model summarized identical case notes differently depending on gender.

An 84-year-old man, “Mr Smith,” was described as having a “complex medical history, no care package and poor mobility,” while “Mrs Smith” was portrayed as “[d]espite her limitations, she is independent and able to maintain her personal care.” In another example, Mr Smith was noted as “unable to access the community,” but Mrs Smith as “able to manage her daily activities.”

These subtle but significant differences risk making women’s needs appear less urgent, and could influence the care and resources provided. By contrast, Meta’s Llama 3 did not use different language based on gender, underscoring that bias can vary across models and the need to measure bias in LLMs adopted for public service delivery

These findings reinforce why AI systems must be valid and reliable, safe, transparent, accountable, privacy-protective, and human-rights affirming. This is especially the case in high risk settings where AI systems affect decisions linked with accessing essential public services.

Categories
Writing

Some Challenges Facing Physician AI Scribes

Recent reporting from the Associated Press highlights the potential challenges in adopting emergent generative AI technologies into the working world. Their reporting focused on how American health care providers are using OpenAI’s transcription tool, Whisper, to transcribe patients’ conversations with medical staff.

These activities are occurring despite OpenAI’s warnings that Whisper should not be used in high-risk domains.

The article reports that a “machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined.”

Transcription errors can be very serious. Research by Prof. Koenecke and Prof. Sloane of the University of Virgina found:

… that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented.

In an example they uncovered, a speaker said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.”

But the transcription software added: “He took a big piece of a cross, a teeny, small piece … I’m sure he didn’t have a terror knife so he killed a number of people.”

A speaker in another recording described “two other girls and one lady.” Whisper invented extra commentary on race, adding “two other girls and one lady, um, which were Black.”

In a third transcription, Whisper invented a non-existent medication called “hyperactivated antibiotics.”

While, in some cases, voice data is deleted for privacy reasons this can impede physicians (or other medical personnel) from double checking the accuracy of transcription. While some may be caught, easily and quickly, more subtle errors or mistakes may be less likely to be caught.

One area where work stills needs to be done is to assess the relative accuracy of the AI scribes versus that of physicians. While there may be errors introduced by automated transcription what is the error rate of physicians? Also, what is the difference in quality of care between one whom is self-transcribing during a meeting vs reviewing transcriptions after the interaction? These are central questions that should play a significant role in assessments of when and how these technologies are deployed.

Categories
Quotations

2014.7.21

Actually taking part in deliberation on priority-setting issues might lead to increased acceptance and trust, but simply being informed that other citizens had that opportunity to do so did not seem to have any effect. Taken together, this implies that people in general might not care that much about the procedure when judging the decision in the case of priority setting in health care. The turn from a focus on principles to a focus on procedures when it comes to priority setting strategies can thus be even more problematic to implement than previous research has suggested.

Jenny de Fine Licht, “Do We Really Want to Know? The Potentially Negative Effect of Transparency in Decision Making on Perceived Legitimacy,” Scandinavian Political Studies 34(3), 2011.