A new study from the London School of Economics highlights how AI systems can reinforce existing inequalities when used for high risk activities like social care.
Writing in The Guardian, Jessica Murray describes how Google’s Gemma model summarized identical case notes differently depending on gender.
An 84-year-old man, “Mr Smith,” was described as having a “complex medical history, no care package and poor mobility,” while “Mrs Smith” was portrayed as “[d]espite her limitations, she is independent and able to maintain her personal care.” In another example, Mr Smith was noted as “unable to access the community,” but Mrs Smith as “able to manage her daily activities.”
These subtle but significant differences risk making women’s needs appear less urgent, and could influence the care and resources provided. By contrast, Meta’s Llama 3 did not use different language based on gender, underscoring that bias can vary across models and the need to measure bias in LLMs adopted for public service delivery
These findings reinforce why AI systems must be valid and reliable, safe, transparent, accountable, privacy-protective, and human-rights affirming. This is especially the case in high risk settings where AI systems affect decisions linked with accessing essential public services.