CVE Search Engine - Security Vulnerabilities and Exploits Search Tool

show all

29 matches found

Packet Storm News•added 2026/06/07 12:0 a.m.•3 views

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 16% hackable by frontier models given only the task description. This corrupts both...

5.5AI score

Exploits0

Rows per page

Query Builder

Family

Bulletin Type

Min CVSS Score

Date

Order by