Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model LLM agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints, identify the exact timestamps of malicious events. The...