


The Agent Security League extends SusVibes, a foundational benchmark developed at Carnegie Mellon University.
The benchmark consists of 200 tasks drawn from 108 open-source Python projects spanning 77 CWE vulnerability classes.
Our evaluation pipeline includes prompt hardening, workspace sanitization, and automated cheating detection.
AI coding agents can write code, but they lack security context. AURI is the security harness your coding agent is missing.
Always free for developers.