Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM As a Judge, and a Lightweight CTF Benchmark
Recent advances in LLM agentic systems have improved the automation of offensive security tasks, particularly for Capture the Flag CTF challenges. We systematically investigate the key factors that drive agent success and provide a detailed recipe for building effective LLM-based offensive securi...