109 matches found
ADR: An Agentic Detection System for Enterprise Agentic AI Security
We present the Agentic AI Detection and Response ADR system, the first large-scale, production-proven enterprise framework for securing AI agents operating through the Model Context Protocol MCP. We identify three persistent challenges in this domain: 1 limited observability -- existing Endpoint...
A Red Teaming Framework for Evaluating Robustness of AI-Enabled Security Orchestration, Automation, and Response Systems
AI-enabled Security Orchestration, Automation, and Response SOAR systems increasingly employ autonomous agents for cyber defense, yet their resilience to adaptive adversaries is underexplored. We introduce an autonomous red teaming framework that integrates large language models LLMs with...
IPI-Proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents against Indirect Prompt Injection
Web-browsing AI agents are increasingly deployed in enterprise settings under strict whitelists of approved domains, yet adversaries can still influence them by embedding hidden instructions in the HTML pages those domains serve. Existing red-teaming resources fall short of this scenario:...
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
The integration of Large Language Models LLMs into Electronic Design Automation EDA and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unprecedented capabilities in generating Register Transfer Level RTL code, automating testbenches, and bridging the semantic...
MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring
We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may under-elicit attacks and overstate monitor performance. We identify three challenges with current red-teaming. First, mode collapse in attack generation,...
MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks
Multi-turn jailbreaks exploit the ability of large language models to accumulate and act on conversational context. Instead of stating a harmful request directly, an attacker can gradually steer the conversation toward an unsafe answer. Recent methods demonstrate this risk, but they are usually...
Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours
AI systems are entering critical domains like healthcare, finance, and defense, yet remain vulnerable to adversarial attacks. While AI red teaming is a primary defense, current approaches force operators into manual, library-specific workflows. Operators spend weeks hand-crafting workflows -...
STARE: Step-Wise Temporal Alignment and Red-Teaming Engine for Multi-Modal Toxicity Attack
Red-teaming Vision-Language Models is essential for identifying vulnerabilities where adversarial image-text inputs trigger toxic outputs. Existing approaches treat image generation as a black box, returning only terminal toxicity scores and leaving open the question of when and how toxic semanti...
Training a General Purpose Automated Red Teaming Model
Automated methods for red teaming LLMs are an important tool to identify LLM vulnerabilities that may not be covered in static benchmarks, allowing for more thorough probing. They can also adapt to each specific LLM to discover weaknesses unique to it. Most current automated red teaming methods a...
Adaptive Instruction Composition for Automated LLM Red-Teaming
Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with identifying effective strategies through trial and error, resulting in a semantically limited range of successes. Another approach discovers diverse attacks ...
SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement
LLM-based agent systems increasingly rely on agent skills sourced from open registries to extend their capabilities, yet the openness of such ecosystems makes skills difficult to thoroughly vet. Existing attacks rely on injecting malicious instructions into skills, making them easily detectable b...
TreeTeaming: Autonomous Red-Teaming of Vision-Language Models Via Hierarchical Strategy Exploration
The rapid advancement of Vision-Language Models VLMs has brought their safety vulnerabilities into sharp focus. However, existing red teaming methods are fundamentally constrained by an inherent linear exploration paradigm, confining them to optimizing within a predefined strategy set and...
T-MAP: Red-Teaming LLM Agents with Trajectory-Aware Evolutionary Search
While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models LLMs, such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protoc...
PISmith: Reinforcement Learning-Based Red Teaming for Prompt Injection Defenses
Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been proposed, their robustness against adaptive attacks remains insufficiently evaluated, potentially creating a false sense of security. In this work, we...
The Face of Penetration Testing is Changing: Announcing Metasploit Pro 5.0.0
The role and demand for red-teaming capabilities are growing, as more exploitable CVEs make their way into criminal hands. Being proactive is no longer a capability that can be reserved for annual tests, but a continuous assessment to determine exposure and even through the validation of an...
Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard Vs. Reasoning Models
As Large Language Models LLMs are increasingly integrated into automated decision-making pipelines, specifically within Human Resources HR, the security implications of Indirect Prompt Injection IPI become critical. While a prevailing hypothesis posits that "Reasoning" or "Chain-of-Thought" Model...
Aegis: Towards Governance, Integrity, and Security of AI Voice Agents
With the rapid advancement and adoption of Audio Large Language Models ALLMs, voice agents are now being deployed in high-stakes domains such as banking, customer service, and IT support. However, their vulnerabilities to adversarial misuse still remain unexplored. While prior work has examined...
Top AI Tools for Red Teaming in 2026
Red teaming has undergone a radical evolution. Modern organizations can no longer rely solely on human creativity or…...
Putting Privacy to the Test: Introducing Red Teaming for Research Data Anonymization
Recently, the data protection practices of researchers in human-computer interaction and elsewhere have gained attention. Initial results suggest that researchers struggle with anonymization, partly due to a lack of clear, actionable guidance. In this work, we propose simulating re-identification...
AJAR: Adaptive Jailbreak Architecture for Red-Teaming
As Large Language Models LLMs evolve from static chatbots into autonomous agents capable of tool execution, the landscape of AI safety is shifting from content moderation to action security. However, existing red-teaming frameworks remain bifurcated: they either focus on rigid, script-based text...