Lucene search
K

371 matches found

Packet Storm News
Packet Storm News
added 2026/05/28 12:0 a.m.6 views

Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models

As Large Language Models evolve for user convenience, vulnerability to jailbreak attacks continues to be reported despite ongoing efforts in safety training. Traditional jailbreak techniques typically focus on a single prompt injection, neglecting the models' ability to remember the flow of...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/27 12:0 a.m.6 views

Evolving Skill-Structured Attack Memory Enhances LLM Jailbreaking

Jailbreak attacks on large language models LLMs aim to induce LLMs to produce content that they are expected to refuse. Automated black-box jailbreak generation is especially important for safety evaluation, where the attacker observes only model outputs and needs to automatically search for...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/26 12:0 a.m.8 views

BAIT: Boundary-Guided Disclosure Escalation Via Self-Conditioned Reasoning

In this work, we propose BAIT Boundary-Aware Iterative Trap, a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the model to identify the protection boundary, then requires it to refine that boundary, and finally requests a detailed...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/23 12:0 a.m.9 views

Reasoning As an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs

Large Reasoning Models LRMs have demonstrated remarkable capabilities in reasoning and generation tasks and are increasingly deployed in real-world applications. However, their explicit chain-of-thought CoT mechanism introduces new security risks, making them particularly vulnerable to jailbreak...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/18 12:0 a.m.7 views

Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling

Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.4 views

Re-Triggering Safeguards within LLMs for Jailbreak Detection

This paper proposes a jailbreaking prompt detection method for large language models LLMs to defend against jailbreak attacks. Although recent LLMs are equipped with built-in safeguards, it remains possible to craft jailbreaking prompts that bypass them. We argue that such jailbreaking prompts ar...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/10 12:0 a.m.3 views

Position: AI Security Policy Should Target Systems, Not Models

We present swarm-attack, an open-source adversarial testing framework in which multiple lightweight LLM agents coordinate through shared memory, parallel exploration, and evolutionary optimization. Together, our results demonstrate that both safety bypass of frontier models and software...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/09 12:0 a.m.2 views

The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security beyond Binary Scoring

Jailbreak attacks -- adversarial prompts that bypass LLM alignment through purely linguistic manipulation -- pose a growing operational security threat, yet the field lacks large-scale, reproducible infrastructure for generating, categorizing, and evaluating them systematically. This paper...

5.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/08 12:0 a.m.4 views

OrchJail: Jailbreaking Tool-Calling Text-To-Image Agents by Orchestration-Guided Fuzzing

Tool-calling text-to-image T2I agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where individually benign steps combine into...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/04 12:0 a.m.2 views

Revisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenses

Defending large language models LLMs against jailbreak attacks, such as Greedy Coordinate Gradient GCG, remains a challenge, particularly under adaptive threat models where an attacker directly targets the defense mechanism. JBShield, a recent jailbreak defense with a 0% attack success rate in so...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/04 12:0 a.m.1 views

ContextualJailbreak: Evolutionary Red-Teaming Via Simulated Conversational Priming

Large language models LLMs remain vulnerable to jailbreak attacks that bypass safety alignment and elicit harmful responses. A growing body of work shows that contextual priming, where earlier turns covertly bias later replies, constitutes a powerful attack surface, with hand-crafted multi-turn...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/02 12:0 a.m.1 views

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

Representation Engineering typically relies on static refusal vectors derived from terminal representations. We move beyond this paradigm, demonstrating that refusal is a dynamic and sparse process rather than a localized outcome. Using Causal Tracing, we uncover the Refusal Trajectory-a persiste...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/30 12:0 a.m.1 views

Jailbroken Frontier Models Retain Their Capabilities

As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. We show that this tax scales inversely with model...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/25 12:0 a.m.2 views

Evaluating Jailbreaking Vulnerabilities in LLMs Deployed As Assistants for Smart Grid Operations: A Benchmark against NERC Standards

The deployment of Large Language Models LLMs as assistants in electric grid operations promises to streamline compliance and decision-making but exposes new vulnerabilities to prompt-based adversarial attacks. This paper evaluates the risk of jailbreaking LLMs, i.e., circumventing safety alignmen...

5.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/23 12:0 a.m.0 views

AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models

Automated red-teaming methods for large language models typically optimize attack prompts within a fixed, human-designed strategy, leaving the attack strategy itself unchanged. We instead optimize the strategy. We propose AutoRISE, a method that searches over executable attack programs rather tha...

5.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/22 12:0 a.m.3 views

AVISE: Framework for Evaluating the Security of AI Systems

As artificial intelligence AI systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of high-profile exploits and consequential system failures. Yet systematic approaches to evaluating AI security remain underdeveloped. In this paper, we...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/18 12:0 a.m.3 views

HarmChip: Evaluating Hardware Security Centric LLM Safety Via Jailbreak Benchmarking

The integration of large language models LLMs into electronic design automation EDA workflows has introduced powerful capabilities for RTL generation, verification, and design optimization, but also raises critical security concerns. Malicious LLM outputs in this domain pose hardware-level threat...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/13 12:0 a.m.1 views

TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs

Large Language Models LLMs are increasingly deployed across diverse domains, yet their vulnerability to jailbreak attacks, where adversarial inputs bypass safety mechanisms to elicit harmful outputs, poses significant security risks. While prior work has primarily focused on prompt injection...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/11 12:0 a.m.3 views

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

Large language models remain vulnerable to jailbreak attacks -- inputs designed to bypass safety mechanisms and elicit harmful responses -- despite advances in alignment and instruction tuning. We propose Head-Masked Nullspace Steering HMNS, a circuit-level intervention that i identifies attentio...

5.8AI score
Exploits0
GithubExploit
GithubExploit
added 2026/03/31 6:56 a.m.93 views

ha-ps4-jb

🎮 PS4 JB Web Server — Home Assistant Add-on A Home Assistant...

5.8AI score
Exploits0
Rows per page
Query Builder