CVE Search Engine - Security Vulnerabilities and Exploits Search Tool

show all

51 matches found

Packet Storm News•added 2026/05/12 12:0 a.m.•13 views

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue tha...

5.8AI score

Exploits0

Rows per page

Query Builder

Family

Bulletin Type

Min CVSS Score

Date

Order by