CVE Search Engine - Security Vulnerabilities and Exploits Search Tool

show all

4 matches found

Packet Storm News•added 2026/04/21 12:0 a.m.•9 views

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges

Large Language Model LLM agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agents on realistic Capture The Flag CTF challenges ...

6AI score

Exploits0

Rows per page

Query Builder

Family

Bulletin Type

Min CVSS Score

Date

Order by