Lucene search
K

376 matches found

Malwarebytes
Malwarebytes
added 2026/06/15 2:32 p.m.21 views

Claude Fable 5 and Mythos 5 “abruptly disabled” after US gov. ban

Anthropic has been ordered by the US government to cut off its newest Claude Fable 5 and Mythos 5 models for fear of abuse by adversaries. Reuters reports that Anthropic said it will "abruptly ​disable" its most advanced AI models for all users after the US government ordered it to suspend access...

5.6AI score
Exploits0
Rapid7 Blog
Rapid7 Blog
added 2026/06/11 1:0 p.m.27 views

Criminal AI-as-a-Service in 2026: How the Underground Market Is Operationalizing Cybercrime

Introduction The underground market for criminally oriented generative AI has moved beyond the early hype surrounding 'malicious chatbots.' The gradual integration of AI as a productivity layer within cybercrime operations has become the dominant story, indicating that while the potential for ful...

6.2AI score
Exploits0
OSSF Malicious Packages
OSSF Malicious Packages
added 2026/06/11 1:56 a.m.9 views

Malicious code in jailbreak-code (npm)

--- -= Per source details. Do not edit below this line.=- Source: amazon-inspector 9f729dde017c78154685be850893a9f3ebd58bf0b5cb1229e7e49fb09b14f5d5 The package presents itself as an AI developer CLI but is engineered as a credential and payment harvester. src/c2.ts hardcodes a Discord webhook URL...

5.5AI score
Exploits0References2
OSV
OSV
added 2026/06/11 1:56 a.m.18 views

MAL-2026-5543 Malicious code in jailbreak-code (npm)

--- -= Per source details. Do not edit below this line.=- Source: amazon-inspector 9f729dde017c78154685be850893a9f3ebd58bf0b5cb1229e7e49fb09b14f5d5 The package presents itself as an AI developer CLI but is engineered as a credential and payment harvester. src/c2.ts hardcodes a Discord webhook URL...

5.5AI score
Exploits0References2
Packet Storm News
Packet Storm News
added 2026/06/10 12:0 a.m.12 views

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Large Language Models LLMs are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding GCD has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this...

5.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/28 12:0 a.m.19 views

Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models

As Large Language Models evolve for user convenience, vulnerability to jailbreak attacks continues to be reported despite ongoing efforts in safety training. Traditional jailbreak techniques typically focus on a single prompt injection, neglecting the models' ability to remember the flow of...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/27 12:0 a.m.8 views

Evolving Skill-Structured Attack Memory Enhances LLM Jailbreaking

Jailbreak attacks on large language models LLMs aim to induce LLMs to produce content that they are expected to refuse. Automated black-box jailbreak generation is especially important for safety evaluation, where the attacker observes only model outputs and needs to automatically search for...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/26 12:0 a.m.11 views

BAIT: Boundary-Guided Disclosure Escalation Via Self-Conditioned Reasoning

In this work, we propose BAIT Boundary-Aware Iterative Trap, a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the model to identify the protection boundary, then requires it to refine that boundary, and finally requests a detailed...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/23 12:0 a.m.16 views

Reasoning As an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs

Large Reasoning Models LRMs have demonstrated remarkable capabilities in reasoning and generation tasks and are increasingly deployed in real-world applications. However, their explicit chain-of-thought CoT mechanism introduces new security risks, making them particularly vulnerable to jailbreak...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/18 12:0 a.m.10 views

Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling

Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.7 views

Re-Triggering Safeguards within LLMs for Jailbreak Detection

This paper proposes a jailbreaking prompt detection method for large language models LLMs to defend against jailbreak attacks. Although recent LLMs are equipped with built-in safeguards, it remains possible to craft jailbreaking prompts that bypass them. We argue that such jailbreaking prompts ar...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/10 12:0 a.m.8 views

Position: AI Security Policy Should Target Systems, Not Models

We present swarm-attack, an open-source adversarial testing framework in which multiple lightweight LLM agents coordinate through shared memory, parallel exploration, and evolutionary optimization. Together, our results demonstrate that both safety bypass of frontier models and software...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/09 12:0 a.m.9 views

The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security beyond Binary Scoring

Jailbreak attacks -- adversarial prompts that bypass LLM alignment through purely linguistic manipulation -- pose a growing operational security threat, yet the field lacks large-scale, reproducible infrastructure for generating, categorizing, and evaluating them systematically. This paper...

5.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/08 12:0 a.m.7 views

OrchJail: Jailbreaking Tool-Calling Text-To-Image Agents by Orchestration-Guided Fuzzing

Tool-calling text-to-image T2I agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where individually benign steps combine into...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/04 12:0 a.m.4 views

Revisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenses

Defending large language models LLMs against jailbreak attacks, such as Greedy Coordinate Gradient GCG, remains a challenge, particularly under adaptive threat models where an attacker directly targets the defense mechanism. JBShield, a recent jailbreak defense with a 0% attack success rate in so...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/04 12:0 a.m.4 views

ContextualJailbreak: Evolutionary Red-Teaming Via Simulated Conversational Priming

Large language models LLMs remain vulnerable to jailbreak attacks that bypass safety alignment and elicit harmful responses. A growing body of work shows that contextual priming, where earlier turns covertly bias later replies, constitutes a powerful attack surface, with hand-crafted multi-turn...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/02 12:0 a.m.4 views

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

Representation Engineering typically relies on static refusal vectors derived from terminal representations. We move beyond this paradigm, demonstrating that refusal is a dynamic and sparse process rather than a localized outcome. Using Causal Tracing, we uncover the Refusal Trajectory-a persiste...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/30 12:0 a.m.5 views

Jailbroken Frontier Models Retain Their Capabilities

As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. We show that this tax scales inversely with model...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/25 12:0 a.m.4 views

Evaluating Jailbreaking Vulnerabilities in LLMs Deployed As Assistants for Smart Grid Operations: A Benchmark against NERC Standards

The deployment of Large Language Models LLMs as assistants in electric grid operations promises to streamline compliance and decision-making but exposes new vulnerabilities to prompt-based adversarial attacks. This paper evaluates the risk of jailbreaking LLMs, i.e., circumventing safety alignmen...

5.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/23 12:0 a.m.2 views

AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models

Automated red-teaming methods for large language models typically optimize attack prompts within a fixed, human-designed strategy, leaving the attack strategy itself unchanged. We instead optimize the strategy. We propose AutoRISE, a method that searches over executable attack programs rather tha...

5.3AI score
Exploits0
Rows per page
Query Builder