LLMs as a New Attack Surface: what does it mean for AI governance?
A recent demonstration showed how an LLM could be "radicalized" over eight hours, bypassing safety guardrails to generate malware at scale. This wasn't a highly technical code-written software exploit; it was achieved through manipulation and persuasion, taking advantage of the model’s contextual learning to make it unlearn its security protocols, revealing a critical gap in AI security.
The paper highlights that AI's attack surface is broader than code. It includes the model, prompts, user interfaces, policies, and even the organizational context. When LLMs are integrated into workflows with access to tools, APIs, and sensitive data, the risks multiply, ranging from generating malicious content to enabling large-scale cyberattacks. AI systems are dynamic, made up of interconnected components that evolve rapidly. As a result, traditional governance can’t keep up. Static checklists and one-time audits aren’t enough (if they ever were). AI management must be continuous, automated, and evidence-based.
[....]