5 matches found
BAIT: Boundary-Guided Disclosure Escalation Via Self-Conditioned Reasoning
In this work, we propose BAIT Boundary-Aware Iterative Trap, a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the model to identify the protection boundary, then requires it to refine that boundary, and finally requests a detailed...
Reasoning As an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs
Large Reasoning Models LRMs have demonstrated remarkable capabilities in reasoning and generation tasks and are increasingly deployed in real-world applications. However, their explicit chain-of-thought CoT mechanism introduces new security risks, making them particularly vulnerable to jailbreak...
Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling
Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...
OrchJail: Jailbreaking Tool-Calling Text-To-Image Agents by Orchestration-Guided Fuzzing
Tool-calling text-to-image T2I agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where individually benign steps combine into...
ArtPerception: ASCII Art-Based Jailbreak on LLMs with Recognition Pre-Test
The integration of Large Language Models LLMs into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data...