3 matches found
Jailbreaking in the Haystack
Recent advances in long-context language models LMs have enabled million-token inputs, expanding their capabilities across complex tasks like computer-use agents. Yet, the safety implications of these extended contexts remain unclear. To bridge this gap, we introduce NINJA short for...
Hijacking Large Language Models Via Adversarial In-Context Learning
In-context learning ICL has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations demos in the preconditioned prompts. Despite its promising performance, crafted adversarial attacks pose a notable threat to the robustness of...
One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
Deep Neural Networks DNNs have achieved widespread success yet remain prone to adversarial attacks. Typically, such attacks either involve frequent queries to the target model or rely on surrogate models closely mirroring the target model -- often trained with subsets of the target model's traini...