Categories
Links

Vibe-Coded Malware Isn’t a Game Changer (Yet)

Over the past week there’s been heightened concern about how LLMs can be used to facilitate cyber operations. Much of that concern is tightly linked to recentreports from Anthropic, which are facing growing criticism from the security community.

Anthropic claimed that a threat actor launched an AI-assisted operation which was up to 90% autonomous. But the LLM largely relied on pre-existing open source tools that operators already chain together, and the success rates appear low. Moreover, hallucinations meant that adversaries were often told that the LLM had done something, or had access to credentials, when it did not.

We should anticipate that LLMs will enable some adversaries to chain together code that could exploit vulnerabilities. But vibe‑coding an exploit chain is not the same as building something that can reliably compromise real systems. To date, experiments with vibe‑coded malware and autonomous agents suggest that generated outputs typically require skilled operators to debug, adapt, and operationalise them. Even then, the outputs of LLM‑assisted malware often fail outright when confronted with real‑world constraints and defences.

That’s partly because exploit development is a different skill set and capability than building “functional‑enough” software. Vibe coding for productivity apps might tolerate flaky edge cases and messy internals. Exploit chains, by contrast, often fail to exploit vulnerabilities unless they are properly tailored to a given target.

An AI system that can assemble a roughly working application from a series of prompts does not automatically inherit the ability to produce highly reliable, end‑to‑end exploit chains. Some capability will transfer, but we should be wary of assuming a neat, 100% carry‑over from vibe‑coded software to effective vibe‑coded malware.

Categories
Links

Even Minimal Data Poisoning Can Undermine AI Model Integrity

As reported by Benj Edwards at Ars Technica, researchers demonstrated that even minimal data poisoning can implant backdoors in large language models.

For the largest model tested (13 billion parameters trained on 260 billion tokens), just 250 malicious documents representing 0.00016 percent of total training data proved sufficient to install the backdoor. The same held true for smaller models, even though the proportion of corrupted data relative to clean data varied dramatically across model sizes.

The findings apply to straightforward attacks like generating gibberish or switching languages. Whether the same pattern holds for more complex malicious behaviors remains unclear. The researchers note that more sophisticated attacks, such as making models write vulnerable code or reveal sensitive information, might require different amounts of malicious data.

The same pattern appeared in smaller models as well:

Despite larger models processing over 20 times more total training data, all models learned the same backdoor behavior after encountering roughly the same small number of malicious examples.

The authors note important limitations: the tested models were all relatively small, the results depend on tainted data being present in the training set, and real-world mitigations like guardrails or corrective fine-tuning may blunt such effects.

Even so, the findings point to the ongoing immaturity of LLM cybersecurity practices and the difficulty of assuring trustworthiness in systems trained at scale. Safely deploying AI in high-risk contexts will require not just policy oversight, but rigorous testing, data provenance controls, and continuous monitoring of model behaviour.

Categories
Links

LSE Study Exposes AI Bias in Social Care

A new study from the London School of Economics highlights how AI systems can reinforce existing inequalities when used for high risk activities like social care.

Writing in The Guardian, Jessica Murray describes how Google’s Gemma model summarized identical case notes differently depending on gender.

An 84-year-old man, “Mr Smith,” was described as having a “complex medical history, no care package and poor mobility,” while “Mrs Smith” was portrayed as “[d]espite her limitations, she is independent and able to maintain her personal care.” In another example, Mr Smith was noted as “unable to access the community,” but Mrs Smith as “able to manage her daily activities.”

These subtle but significant differences risk making women’s needs appear less urgent, and could influence the care and resources provided. By contrast, Meta’s Llama 3 did not use different language based on gender, underscoring that bias can vary across models and the need to measure bias in LLMs adopted for public service delivery

These findings reinforce why AI systems must be valid and reliable, safe, transparent, accountable, privacy-protective, and human-rights affirming. This is especially the case in high risk settings where AI systems affect decisions linked with accessing essential public services.