Universal Jailbreak Suffixes Are Strong Attention Hijackers
We study suffix-based jailbreaks$\unicodex2013$a powerful family of attacks against large language models LLMs that optimize adversarial suffixes to circumvent safety alignment. Focusing on the widely used foundational GCG attack Zou et al., 2023, we observe that suffixes vary in efficacy: some...