Lucene search
K

17 matches found

Packet Storm News
Packet Storm News
added 2026/05/18 12:0 a.m.6 views

Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling

Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/10 12:0 a.m.2 views

Position: AI Security Policy Should Target Systems, Not Models

We present swarm-attack, an open-source adversarial testing framework in which multiple lightweight LLM agents coordinate through shared memory, parallel exploration, and evolutionary optimization. Together, our results demonstrate that both safety bypass of frontier models and software...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/30 12:0 a.m.1 views

Jailbroken Frontier Models Retain Their Capabilities

As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. We show that this tax scales inversely with model...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/27 12:0 a.m.1 views

Jailbreaking Frontier Foundation Models through Intention Deception

Large vision-language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim to have the model learn a refusal boundary between safe and unsafe, based on the user's intent. It has been found that this binary training regime ofte...

5.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/21 12:0 a.m.4 views

T-MAP: Red-Teaming LLM Agents with Trajectory-Aware Evolutionary Search

While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models LLMs, such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protoc...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/16 12:0 a.m.1 views

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code repositories. This creates exposure to indirect prompt injection attacks, where adversarial instructions embedded in external content manipulate agent...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/02 12:0 a.m.6 views

ZeroDayBench: Evaluating LLM Agents on Unseen Zero-Day Vulnerabilities for Cyberdefense

Large language models LLMs are increasingly being deployed as software engineering agents that autonomously contribute to repositories. A major benefit these agents present is their ability to find and patch security vulnerabilities in the codebases they oversee. To estimate the capability of...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/16 12:0 a.m.6 views

AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises

Today's leading AI models engage in sophisticated behaviour when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and th...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/13 12:0 a.m.3 views

In-Context Autonomous Network Incident Response: An End-To-End Large Language Model Agent Approach

Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively explored the reinforcement learning approach, which involves learning response strategies through extensive simulation of the incident. While this...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/31 12:0 a.m.4 views

Jailbreaking LLMs Via Calibration

Safety alignment in Large Language Models LLMs often creates a systematic discrepancy between a model's aligned output and the underlying pre-aligned data distribution. We propose a framework in which the effect of safety alignment on next-token prediction is modeled as a systematic distortion of...

5.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/12/10 12:0 a.m.1 views

LLM-PEA: Leveraging Large Language Models against Phishing Email Attacks

Email phishing is one of the most prevalent and globally consequential vectors of cyber intrusion. As systems increasingly deploy Large Language Models LLMs applications, these systems face evolving phishing email threats that exploit their fundamental architectures. Current LLMs require...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/12/02 12:0 a.m.4 views

Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks

Vibe coding is a new programming paradigm in which human engineers instruct large language model LLM agents to complete complex coding tasks with little supervision. Although it is increasingly adopted, are vibe coding outputs really safe to deploy in production? To answer this question, we propo...

6.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/15 12:0 a.m.5 views

Toward Cybersecurity-Expert Small Language Models

Large language models LLMs are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality, domain-specific models and training datasets. To address this gap, we present CyberPal 2.0, a family of cybersecurity-expert small language models SLMs ranging fr...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/21 12:0 a.m.3 views

AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models

We introduce AIRTBench, an AI red teaming benchmark for evaluating language models' ability to autonomously discover and exploit Artificial Intelligence and Machine Learning AI/ML security vulnerabilities. The benchmark consists of 70 realistic black-box capture-the-flag CTF challenges from the...

7.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/17 12:0 a.m.0 views

SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents

As Large Language Models LLMs are increasingly deployed as autonomous agents in complex and long horizon settings, it is critical to evaluate their ability to sabotage users by pursuing hidden objectives. We study the ability of frontier LLMs to evade monitoring and achieve harmful hidden goals...

7.1AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/31 12:0 a.m.2 views

Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences

LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of...

7.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/05 12:0 a.m.2 views

RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents

Uncontrollable autonomous replication of language model agents poses a critical safety risk. To better understand this risk, we introduce RepliBench, a suite of evaluations designed to measure autonomous replication capabilities. RepliBench is derived from a decomposition of these capabilities...

7.2AI score
Exploits0
Rows per page
Query Builder