Lucene search
K

5 matches found

Packet Storm News
Packet Storm News
added 2026/01/07 12:0 a.m.7 views

Jailbreaking LLMs and VLMs: Mechanisms, Evaluation, and Unified Defense

This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models LLMs and Vision-Language Models VLMs, emphasizing that jailbreak vulnerabilities stem from structural factors such as incomplete training data, linguistic ambiguity, and generative uncertainty. It...

7.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/23 12:0 a.m.10 views

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Multimodal large language models MLLMs have achieved remarkable progress, yet remain critically vulnerable to adversarial attacks that exploit weaknesses in cross-modal processing. We present a systematic study of multimodal jailbreaks targeting both vision-language and audio-language models,...

7.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/20 12:0 a.m.10 views

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

Multimodal large language models MLLMs have demonstrated significant utility across diverse real-world applications. But MLLMs remain vulnerable to jailbreaks, where adversarial inputs can collapse their safety constraints and trigger unethical responses. In this work, we investigate jailbreaks i...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/07/29 12:0 a.m.2 views

Secure Tug-Of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security

The rapid advancement of multimodal large language models MLLMs has led to breakthroughs in various applications, yet their security remains a critical challenge. One pressing issue involves unsafe image-query pairs--jailbreak inputs specifically designed to bypass security constraints and elicit...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/22 12:0 a.m.2 views

Pushing the Limits of Safety: a Technical Report on the ATLAS Challenge 2025

Multimodal Large Language Models MLLMs have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing...

7.6AI score
Exploits0
Rows per page
Query Builder