3 matches found
CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-To-End Cybersecurity Capabilities
AI has the potential to transform cybersecurity by enabling systems that can autonomously detect, analyze, and remediate software vulnerabilities. However, existing cybersecurity evaluations of AI systems are limited in scale or scope, and fail to capture the end-to-end lifecycle of real-world...
PQC-LEO: An Evaluation Framework for Post-Quantum Cryptographic Algorithms
Advances in quantum computing threaten digital communication security by undermining the foundations of current public-key cryptography through Shor's quantum algorithm. This has driven the development of Post-Quantum Cryptography PQC, a new set of algorithms resistant to quantum attacks. While...
PentestEval: Benchmarking LLM-Based Penetration Testing with Modular and Stage-Level Design
Penetration testing is essential for assessing and strengthening system security against real-world threats, yet traditional workflows remain highly manual, expertise-intensive, and difficult to scale. Although recent advances in Large Language Models LLMs offer promising opportunities for...