The State of AI Security

I have some thoughts on Sander Schulhoff’s appearance on Lenny’s Podcast. The episode, entitled The coming AI security crisis is a deep dive into the state of AI security (concerning) and what application developers can do about it (less than you’d think).

In terms of threat modeling, the easiest way to think about it to think about the LLMs as a person, and the inherent threats being very similar to social engineering. With enough tenacity, an LLM can be convinced to say anything you want it to say, divulge any information it has access to, and perform any task it has the ability to perform. This was a known, relatively minor risk before agents took off. It was fairly obvious that you could trick LLMs into doing things that their creators didn’t want them to do.

With agents, the risks are greatly magnified. LLMs are going to be given access to perform lots of tasks on behalf of the user, which means that an attacker will most likely be able to coerce the agent into doing their bidding.

The big problem is that that while the mitigations are conceptually simple (resrict access, restrict capabilities) implementing them effectively is quite hard. Throughout the episode Sander talks a lot about how different AI security is than classical security, but actually it’s not so different if you think of agents as users who can be manipulated by attackers. You can secure them by either restricting who can talk to the agents, or by restricting what the agents can do. The principles are quite similar, but the attack surface has the potential to be much more complex. These concerns will inevitably make agents harder to use and riskier to adopt, but I expect that a few high profile attacks will cement in people’s minds that deploying agents requires great care.

One really practical takeaway is that all users should be very careful about adopting agentic browsers. If an LLM has access to your browsing session, it has access to any data you have access to in your browsing session, and the ability to exfiltrate data by making web requests. All that’s left is for an attacker to manipulate the agent in the browser. Brave has published a lot about agentic browser vulnerabilities if you want an example of what I’m talking about.

Overall I think this podcast is an essential listen if you’re building with AI.