9 matches found
SlowBA: An Efficiency Backdoor Attack Towards VLM-Based GUI Agents
Modern vision-language-model VLM based graphical user interface GUI agents are expected not only to execute actions accurately but also to respond to user instructions with low latency. While existing research on GUI-agent security mainly focuses on manipulating action correctness, the security...
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
Large language models LLMs remain vulnerable to multi-turn jailbreaking attacks that exploit conversational context to bypass safety constraints gradually. These attacks target different harm categories like malware generation, harassment, or fraud through distinct conversational approaches...
Multilingual Source Tracing of Speech Deepfakes: a First Benchmark
Recent progress in generative AI has made it increasingly easy to create natural-sounding deepfake speech from just a few seconds of audio. While these tools support helpful applications, they also raise serious concerns by making it possible to generate convincing fake speech in many languages...
SmartHome-Bench: a Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
Video anomaly detection VAD is essential for enhancing safety and security by identifying unusual events across different environments. Existing VAD benchmarks, however, are primarily designed for general-purpose scenarios, neglecting the specific characteristics of smart home applications. To...
IF-GUIDE: Influence Function-Guided Detoxification of LLMs
We study how training data contributes to the emergence of toxic behaviors in large-language models. Most prior work on reducing model toxicity adopts $reactive$ approaches, such as fine-tuning pre-trained and potentially toxic models to align them with human values. In contrast, we propose a...
Lifelong Safety Alignment for Language Models
LLMs have made impressive progress, but their growing capabilities also expose them to highly flexible jailbreaking attacks designed to bypass safety alignment. While many existing defenses focus on known types of attacks, it is more critical to prepare LLMs for unseen attacks that may arise duri...
LAMDA: a Longitudinal Android Malware Benchmark for Concept Drift Analysis
Machine learning ML-based malware detection systems often fail to account for the dynamic nature of real-world training and test data distributions. In practice, these distributions evolve due to frequent changes in the Android ecosystem, adversarial development of new malware families, and the...
AutoRAN: Weak-To-Strong Jailbreaking of Large Reasoning Models
This paper presents AutoRAN, the first automated, weak-to-strong jailbreak attack framework targeting large reasoning models LRMs. At its core, AutoRAN leverages a weak, less-aligned reasoning model to simulate the target model's high-level reasoning structures, generates narrative prompts, and...
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts
Ensuring the ethical deployment of text-to-image models requires effective techniques to prevent the generation of harmful or inappropriate content. While concept erasure methods offer a promising solution, existing finetuning-based approaches suffer from notable limitations. Anchor-free methods...