3 matches found
How Reliable Are AI Attackers against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency
Large language models LLMs can autonomously conduct multi-stage cyber attacks, but the consistency of their offensive behavior under repeated trials remains unstudied. This work presents the first large-scale empirical measurement of LLM attack consistency: 400 autonomous penetration testing runs...
When Safety Detectors Aren'T Enough: a Stealthy and Effective Jailbreak Attack on LLMs Via Steganographic Techniques
Jailbreak attacks pose a serious threat to large language models LLMs by bypassing built-in safety mechanisms and leading to harmful outputs. Studying these attacks is crucial for identifying vulnerabilities and improving model security. This paper presents a systematic survey of jailbreak method...
Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-Based Multi-Agent Debate
Multi-Agent Debate MAD, leveraging collaborative interactions among Large Language Models LLMs, aim to enhance reasoning capabilities in complex tasks. However, the security implications of their iterative dialogues and role-playing characteristics, particularly susceptibility to jailbreak attack...