36 matches found
Defenses and Enablers for Skill Injection Attacks on Terminal Based Agents
Large language model LLM agents increasingly rely on reusable skills i.e. documents describing task-specific procedures. However, this introduces a new attack surface for agents to manage. We study two complementary directions for this threat. First, we evaluate guardian-based defenses: an...
Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
As Large Language Models evolve for user convenience, vulnerability to jailbreak attacks continues to be reported despite ongoing efforts in safety training. Traditional jailbreak techniques typically focus on a single prompt injection, neglecting the models' ability to remember the flow of...
Adversarial Vulnerability under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection
We present a longitudinal, drift-aware evaluation of adversarial robustness across more than a decade of Android applications using static and dynamic feature representations extracted from emulator and real-device executions. The dataset is organized into yearly slices and evaluated under three...
Evaluating Jailbreaking Vulnerabilities in LLMs Deployed As Assistants for Smart Grid Operations: A Benchmark against NERC Standards
The deployment of Large Language Models LLMs as assistants in electric grid operations promises to streamline compliance and decision-making but exposes new vulnerabilities to prompt-based adversarial attacks. This paper evaluates the risk of jailbreaking LLMs, i.e., circumventing safety alignmen...
AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models
Automated red-teaming methods for large language models typically optimize attack prompts within a fixed, human-designed strategy, leaving the attack strategy itself unchanged. We instead optimize the strategy. We propose AutoRISE, a method that searches over executable attack programs rather tha...
BadSkill: Backdoor Attacks on Agent Skills Via Model-In-Skill Poisoning
Agent ecosystems increasingly rely on installable skills to extend functionality, and some skills bundle learned model artifacts as part of their execution logic. This creates a supply-chain risk that is not captured by prompt injection or ordinary plugin misuse: a third-party skill may appear...
How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition
LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code repositories. This creates exposure to indirect prompt injection attacks, where adversarial instructions embedded in external content manipulate agent...
When Scanners Lie: Evaluator Instability in LLM Red-Teaming
Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates ASR. Yet the validity of these measurements hinges on an often-overlooked component: the evaluator who determines whether an attack has succeeded. In this study, we...
IU: Imperceptible Universal Backdoor Attack
Backdoor attacks pose a critical threat to the security of deep neural networks, yet existing efforts on universal backdoors often rely on visually salient patterns, making them easier to detect and less practical at scale. In this work, we introduce a novel imperceptible universal backdoor attac...
Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models
Multimodal Diffusion Language Models MDLMs have recently emerged as a competitive alternative to their autoregressive counterparts. Yet their vulnerability to backdoor attacks remains largely unexplored. In this work, we show that well-established data-poisoning pipelines can successfully implant...
AdapTools: Adaptive Tool-Based Indirect Prompt Injection Attacks on Agentic LLMs
The integration of external data services e.g., Model Context Protocol, MCP has made large language model-based agents increasingly powerful for complex task execution. However, this advancement introduces critical security vulnerabilities, particularly indirect prompt injection IPI attacks...
Evasion of IoT Malware Detection Via Dummy Code Injection
The Internet of Things IoT has revolutionized connectivity by linking billions of devices worldwide. However, this rapid expansion has also introduced severe security vulnerabilities, making IoT devices attractive targets for malware such as the Mirai botnet. Power side-channel analysis has...
PINA: Prompt Injection Attack against Navigation Agents
Navigation agents powered by large language models LLMs convert natural language instructions into executable plans and actions. Compared to text-based applications, their security is far more critical: a successful prompt injection attack does not just alter outputs but can directly misguide...
Sockpuppetting: Jailbreaking LLMs without Optimization through Output Prefix Injection
As open-weight large language models LLMs increase in capabilities, safeguarding them against malicious prompts and understanding possible attack vectors becomes ever more important. While automated jailbreaking methods like GCG Zou et al., 2023 remain effective, they often require substantial...
LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors
The rapid growth in both the scale and complexity of Android malware has driven the widespread adoption of machine learning ML techniques for scalable and accurate malware detection. Despite their effectiveness, these models remain vulnerable to adversarial attacks that introduce carefully crafte...
Odysseus: Jailbreaking Commercial Multimodal LLM-Integrated Systems Via Dual Steganography
By integrating language understanding with perceptual modalities such as images, multimodal large language models MLLMs constitute a critical substrate for modern AI systems, particularly intelligent agents operating in open and interactive environments. However, their increasing accessibility al...
6DAttack: Backdoor Attacks in the 6DoF Pose Estimation
Deep learning advances have enabled accurate six-degree-of-freedom 6DoF object pose estimation, widely used in robotics, AR/VR, and autonomous systems. However, backdoor attacks pose significant security risks. While most research focuses on 2D vision, 6DoF pose estimation remains largely...
CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World
Object detection models deployed in real-world applications such as autonomous driving face serious threats from backdoor attacks. Despite their practical effectiveness,existing methods are inherently limited in both capability and robustness due to their dependence on single-trigger-single-objec...
Whispering poetry at AI can make it break its own rules
Most of the big AI makers don't like people using their models for unsavory activity. Ask one of the mainstream AI models how to make a bomb or create nerve gas and you'll get the standard "I don't help people do harmful things" response. That has spawned a cat-and-mouse game of people who try to...
Prompt Injection Through Poetry
In a new paper, "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," researchers found that turning LLM prompts into poetry resulted in jailbreaking the models: Abstract : We present evidence that adversarial poetry functions as a universal single-turn...