11 matches found
BadBone: Backdoor Attacks against Backbone Models in Visual Prompt Learning
Prompt learning is a new machine learning paradigm that has attracted ample attention due to its simplicity and proven efficacy. Despite its growing adoption, the security vulnerabilities associated with this paradigm remain underexplored. In this work, we take the first step to propose BadBone, ...
Evaluation of Prompt Injection Defenses in Large Language Models
LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that evolves its strategies over hundreds of rounds and tested it against nine defense configurations across more than 20,000 attacks. Every defense tha...
Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation
Semantic segmentation models are widely deployed in safety-critical applications such as autonomous driving, yet their vulnerability to backdoor attacks remains largely underexplored. Prior segmentation backdoor studies transfer threat settings from existing image classification tasks, focusing...
Multi-Turn Jailbreaking Attack in Multi-Modal Large Language Models
In recent years, the security vulnerabilities of Multi-modal Large Language Models MLLMs have become a serious concern in the Generative Artificial Intelligence GenAI research. These highly intelligent models, capable of performing multi-modal tasks with high accuracy, are also severely susceptib...
Memory Poisoning Attack and Defense on Memory Based LLM-Agents
Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory and influence future responses. Recent work demonstrated that the MINJA...
Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models
Large Language Models LLMs have demonstrated exceptional performance across various tasks, but their security vulnerabilities can be exploited by attackers to generate harmful content, causing adverse impacts across various societal domains. Most existing jailbreak methods revolve around Prompt...
Securing Large Language Models (LLMs) from Prompt Injection Attacks
Large Language Models LLMs are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-following ability to make it perform malicious tasks. Recent work has proposed JATMO, a task-specif...
The WASM Cloak: Evaluating Browser Fingerprinting Defenses under WebAssembly Based Obfuscation
Browser fingerprinting defenses have historically focused on detecting JavaScriptJS-based tracking techniques. However, the widespread adoption of WebAssembly WASM introduces a potential blind spot, as adversaries can convert JS to WASM's low-level binary format to obfuscate malicious logic. This...
Your Agent Can Defend Itself against Backdoor Attacks
Despite their growing adoption across domains, large language model LLM-powered agents face significant security risks from backdoor attacks during training and fine-tuning. These compromised agents can subsequently be manipulated to execute malicious operations when presented with specific...
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
As large language models LLMs continue to evolve, it is critical to assess the security threats and vulnerabilities that may arise both during their training phase and after models have been deployed. This survey seeks to define and categorize the various attacks targeting LLMs, distinguishing...
OET: Optimization-Based Prompt Injection Evaluation Toolkit
Large Language Models LLMs have demonstrated remarkable capabilities in natural language understanding and generation, enabling their widespread adoption across various domains. However, their susceptibility to prompt injection attacks poses significant security risks, as adversarial inputs can...