Lucene search
K

216 matches found

Packet Storm News
Packet Storm News
added 2026/06/11 12:0 a.m.4 views

MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems

Hierarchical multi-agent systems MAS are rapidly being deployed in high-stakes workflows across domains such as finance and software engineering. In these systems, safety and security are inherently distributed across role-specialized agents, significantly expanding the attack surface, particular...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/10 12:0 a.m.7 views

PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

Large Language Models LLMs are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses mainly focus on blocking malicious content at...

5.4AI score
Exploits0
Microsoft Secure
Microsoft Secure
added 2026/06/04 7:14 p.m.12 views

Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us

In this article 1. Why the Taxonomy Needed Updating 2. Seven new failure modes 3. Operational findings: What red teaming showed 4. New mitigations 5. What to do this quarter When the Microsoft AI Red Team published the Taxonomy of Failure Modes in Agentic AI Systems in April 2025, the goal was a...

8.8CVSS5.8AI score0.08016EPSS
Exploits5
Packet Storm News
Packet Storm News
added 2026/06/04 12:0 a.m.26 views

RedEdit: Agentic Red-Teaming of Image Safety Classifiers Via MCTS-Guided Photo-Editing

Image safety classifiers serve as a critical component of contemporary content moderation systems on the internet. However, their resilience against user-style malicious image editing remains underexplored. Such behaviors are highly prevalent in daily scenarios but difficult to fully reproduce. T...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/02 12:0 a.m.10 views

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

Recent computer-using-agent CUA red-teaming papers report prompt-injection attack success rates ASR of 42-98%, but these headline numbers cluster on retired models and on the most-vulnerable model in each paper's panel. We ask whether those techniques, reproduced as hand-crafted templates, still...

5.5AI score
Exploits0
GithubExploit
GithubExploit
added 2026/06/01 10:12 a.m.58 views

-cascade-scan

cascade-scan AI Agent security evaluation framework — autom...

6.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/17 12:0 a.m.25 views

ADR: An Agentic Detection System for Enterprise Agentic AI Security

We present the Agentic AI Detection and Response ADR system, the first large-scale, production-proven enterprise framework for securing AI agents operating through the Model Context Protocol MCP. We identify three persistent challenges in this domain: 1 limited observability -- existing Endpoint...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/16 12:0 a.m.8 views

A Red Teaming Framework for Evaluating Robustness of AI-Enabled Security Orchestration, Automation, and Response Systems

AI-enabled Security Orchestration, Automation, and Response SOAR systems increasingly employ autonomous agents for cyber defense, yet their resilience to adaptive adversaries is underexplored. We introduce an autonomous red teaming framework that integrates large language models LLMs with...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/12 12:0 a.m.16 views

IPI-Proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents against Indirect Prompt Injection

Web-browsing AI agents are increasingly deployed in enterprise settings under strict whitelists of approved domains, yet adversaries can still influence them by embedding hidden instructions in the HTML pages those domains serve. Existing red-teaming resources fall short of this scenario:...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/12 12:0 a.m.11 views

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue tha...

5.8AI score
Exploits0
The Hacker News
The Hacker News
added 2026/05/11 11:30 a.m.24 views

Your Purple Team Isn't Purple — It's Just Red and Blue in the Same Room

Defending a network at 2 am looks a lot like this: an analyst copy-pasting a hash from a PDF into a SIEM query. A red team script is being rewritten by hand so the blue team can use it. A patch waiting on a change-approval window that's longer than the exploitation window itself. Nobody in that...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.6 views

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

The integration of Large Language Models LLMs into Electronic Design Automation EDA and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unprecedented capabilities in generating Register Transfer Level RTL code, automating testbenches, and bridging the semantic...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/10 12:0 a.m.6 views

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may under-elicit attacks and overstate monitor performance. We identify three challenges with current red-teaming. First, mode collapse in attack generation,...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/09 12:0 a.m.24 views

MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks

Multi-turn jailbreaks exploit the ability of large language models to accumulate and act on conversational context. Instead of stating a harmful request directly, an attacker can gradually steer the conversation toward an unsafe answer. Recent methods demonstrate this risk, but they are usually...

5.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/07 12:0 a.m.13 views

Autonomous Adversary: Red-Teaming in the Age of LLM

Language Model Agents LMAs are emerging as a powerful primitive for augmenting red-team operations. They can support attack planning, adversary emulation, and the orchestration of multi-step activity such as lateral movement, a core enabling capability of advanced persistent threat APT campaigns...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/06 12:0 a.m.18 views

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing number of real-world incidents have...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/05 12:0 a.m.6 views

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

AI systems are entering critical domains like healthcare, finance, and defense, yet remain vulnerable to adversarial attacks. While AI red teaming is a primary defense, current approaches force operators into manual, library-specific workflows. Operators spend weeks hand-crafting workflows -...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/03 12:0 a.m.4 views

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work:...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/01 12:0 a.m.4 views

STARE: Step-Wise Temporal Alignment and Red-Teaming Engine for Multi-Modal Toxicity Attack

Red-teaming Vision-Language Models is essential for identifying vulnerabilities where adversarial image-text inputs trigger toxic outputs. Existing approaches treat image generation as a black box, returning only terminal toxicity scores and leaving open the question of when and how toxic semanti...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/24 12:0 a.m.4 views

Training a General Purpose Automated Red Teaming Model

Automated methods for red teaming LLMs are an important tool to identify LLM vulnerabilities that may not be covered in static benchmarks, allowing for more thorough probing. They can also adapt to each specific LLM to discover weaknesses unique to it. Most current automated red teaming methods a...

5.6AI score
Exploits0
Rows per page
Query Builder