5 matches found
Now You Hear Me: Audio Narrative Attacks against Large Audio-Language Models
Large audio-language models increasingly operate on raw speech inputs, enabling more seamless integration across domains such as voice assistants, education, and clinical triage. This transition, however, introduces a distinct class of vulnerabilities that remain largely uncharacterized. We exami...
Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack
Audio-language models combine audio encoders with large language models to enable multimodal reasoning, but they also introduce new security vulnerabilities. We propose a universal targeted latent space attack, an encoder-level adversarial attack that manipulates audio latent representations to...
StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models Via Style-Aware Audio Jailbreak
Large Audio-language Models LAMs have recently enabled powerful speech-based interactions by coupling audio encoders with Large Language Models LLMs. However, the security of LAMs under adversarial attacks remains underexplored, especially through audio jailbreaks that craft malicious audio promp...
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs
As large language models become increasingly integrated into daily life, audio has emerged as a key interface for human-AI interaction. However, this convenience also introduces new vulnerabilities, making audio a potential attack surface for adversaries. Our research introduces WhisperInject, a...
The Man behind the Sound: Demystifying Audio Private Attribute Profiling Via Multimodal Large Language Model Agents
Our research uncovers a novel privacy risk associated with multimodal large language models MLLMs: the ability to infer sensitive personal attributes from audio data -- a technique we term audio private attribute profiling. This capability poses a significant threat, as audio can be covertly...