Lucene search
K

4 matches found

Packet Storm News
Packet Storm News
added 2026/05/03 12:0 a.m.1 views

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work:...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/20 12:0 a.m.2 views

ARES: Adaptive Red-Teaming and End-To-End Repair of Policy-Reward System

Reinforcement Learning from Human Feedback RLHF is central to aligning Large Language Models LLMs, yet it introduces a critical vulnerability: an imperfect Reward Model RM can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/08 12:0 a.m.4 views

RedTWIZ: Diverse LLM Red Teaming Via Adaptive Attack Planning

This paper presents the vision, scientific contributions, and technical details of RedTWIZ: an adaptive and diverse multi-turn red teaming framework, to audit the robustness of Large Language Models LLMs in AI-assisted software development. Our work is driven by three major research streams: 1...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/02 12:0 a.m.8 views

RedCodeAgent: Automatic Red-Teaming Agent against Diverse Code Agents

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements have streamlined complex workflows, they have also...

7.9AI score
Exploits0
Rows per page
Query Builder