5 matches found
AgentShield: Deception-Based Compromise Detection for Tool-Using LLM Agents
Defenses against indirect prompt injection IPI in tool-using LLM agents share two structural weaknesses. First, they all attempt to prevent attacks rather than detect the compromises that slip through. Second, they have only been evaluated in English, leaving users of low-resource languages such ...
A new era of agents, a new era of posture
The rise of AI Agents marks one of the most exciting shifts in technology today. Unlike traditional applications or cloud resources, these agents are not passive components- they reason, make decisions, invoke tools, and interact with other agents and systems on behalf of users. This autonomy...
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
In this work, we show that some machine unlearning methods may fail when subjected to straightforward prompt attacks. We systematically evaluate eight unlearning techniques across three model families, and employ output-based, logit-based, and probe analysis to determine to what extent supposedly...
InjectLab: a Tactical Framework for Adversarial Threat Modeling against Large Language Models
Large Language Models LLMs are changing the way people interact with technology. Tools like ChatGPT and Claude AI are now common in business, research, and everyday life. But with that growth comes new risks, especially prompt-based attacks that exploit how these models process language. InjectLa...
Exploiting DeepSeek-R1: Breaking Down Chain of Thought Security
This entry explores how the Chain of Thought reasoning in the DeepSeek-R1 AI model can be susceptible to prompt attacks, insecure output generation, and sensitive data theft...