2 matches found
Mitigating Jailbreaks with Intent-Aware LLMs
Despite extensive safety-tuning, large language models LLMs remain vulnerable to jailbreak attacks via adversarially crafted instructions, reflecting a persistent trade-off between safety and task performance. In this work, we propose Intent-FT, a simple and lightweight fine-tuning approach that...
AdInject: Real-World Black-Box Attacks on Web Agents Via Advertising Delivery
Vision-Language Model VLM based Web Agents represent a significant step towards automating complex tasks by simulating human-like interaction with websites. However, their deployment in uncontrolled web environments introduces significant security vulnerabilities. Existing research on adversarial...