3 matches found
Are Frontier LLMs Ready for Cybersecurity? Evidence for Vertical Foundation Models from Dual-Mode Vulnerability Benchmarks
We evaluate whether frontier LLMs are ready for cybersecurity through a dual-mode benchmark: white-box function-level vulnerability detection VulnLLM-R, across C/Java/Python and black-box web application security testing five production-style applications with 118 ground-truth vulnerabilities...
Perceptual Gaps: ASCII Art and Overlapping Audio As CAPTCHA
As multimodal large language models LLMs advance, traditional CAPTCHAs have become obsolete at distinguishing humans from bots. To address this shift, this paper aims to investigate the possibility of using tasks for which humans have evolved highly specialised neural processing. We introduce two...
Can AI Models Be Jailbroken to Phish Elderly Victims? an End-To-End Evaluation
We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to generate phishing content, to deploying those messages against real targets, to successfully compromising elderly victims. We systematically evaluated...