5 matches found
What One Predator Case Can Reveal About an Online Platform’s Safety Gaps
When a predator contacts a child through an online platform, the details of how it happened often expose…...
Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling
Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...
Can AI Models Be Jailbroken to Phish Elderly Victims? an End-To-End Evaluation
We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to generate phishing content, to deploying those messages against real targets, to successfully compromising elderly victims. We systematically evaluated...
What Really Matters in Many-Shot Attacks? an Empirical Study of Long-Context Vulnerabilities in LLMs
We investigate long-context vulnerabilities in Large Language Models LLMs through Many-Shot Jailbreaking MSJ. Our experiments utilize context length of up to 128K tokens. Through comprehensive analysis with various many-shot attack settings with different instruction styles, shot density, topic,...
Dark LLMs: the Growing Threat of Unaligned AI Models
Large Language Models LLMs rapidly reshape modern life, advancing fields from healthcare to education and beyond. However, alongside their remarkable capabilities lies a significant threat: the susceptibility of these models to jailbreaking. The fundamental vulnerability of LLMs to jailbreak...