AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
We introduce AIRTBench, an AI red teaming benchmark for evaluating language models' ability to autonomously discover and exploit Artificial Intelligence and Machine Learning AI/ML security vulnerabilities. The benchmark consists of 70 realistic black-box capture-the-flag CTF challenges from the...