Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling
Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...