9 matches found
Follow My Eyes: Backdoor Attacks on VLM-Based Scanpath Prediction
Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-bas...
From SFT to RL: Demystifying the Post-Training Pipeline for LLM-Based Vulnerability Detection
The integration of LLMs into vulnerability detection VD has shifted the field toward interpretable and context-aware analysis. While post-training methods have shown promise in general coding tasks, their systematic application to VD remains underexplored. In this paper, we present the first...
A one-prompt attack that breaks LLM safety alignment
Large language models LLMs and diffusion models now power a wide range of applications, from document assistance to text-to-image generation, and users increasingly expect these systems to be safety-aligned by default. Yet safety alignment is only as robust as its weakest failure mode. Despite...
A one-prompt attack that breaks LLM safety alignment
Large language models LLMs and diffusion models now power a wide range of applications, from document assistance to text-to-image generation, and users increasingly expect these systems to be safety-aligned by default. Yet safety alignment is only as robust as its weakest failure mode. Despite...
Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models
Low-Rank Adaptation LoRA has emerged as an efficient method for fine-tuning large language models LLMs and is widely adopted within the open-source community. However, the decentralized dissemination of LoRA adapters through platforms such as Hugging Face introduces novel security vulnerabilities...
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
Large Language Models LLMs have demonstrated remarkable capabilities across a wide range of tasks, yet they also exhibit memorization of their training data. This phenomenon raises critical questions about model behavior, privacy risks, and the boundary between learning and memorization. Addressi...
Beyond Training-Time Poisoning: Component-Level and Post-Training Backdoors in Deep Reinforcement Learning
Deep Reinforcement Learning DRL systems are increasingly used in safety-critical applications, yet their security remains severely underexplored. This work investigates backdoor attacks, which implant hidden triggers that cause malicious actions only when specific inputs appear in the observation...
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
This paper proposes MergeGuard, a novel methodology for mitigation of AI Trojan attacks. Trojan attacks on AI models cause inputs embedded with triggers to be misclassified to an adversary's target class, posing a significant threat to model usability trained by an untrusted third party. The core...
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding
Variational Autoencoders VAEs have played a key role in scaling up diffusion-based generative models, as in Stable Diffusion, yet questions regarding their robustness remain largely underexplored. Although adversarial training has been an established technique for enhancing robustness in predicti...