Lucene search
K

10 matches found

Packet Storm News
Packet Storm News
added 2026/05/04 12:0 a.m.2 views

A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts

Existing benchmarks of language-model refusal on malicious-coding tasks routinely conflate requests for executable malicious software with requests for harmful security knowledge. This conflation matters because the two request types plausibly trigger distinct refusal pathways in safety-aligned...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/30 12:0 a.m.2 views

Jailbroken Frontier Models Retain Their Capabilities

As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. We show that this tax scales inversely with model...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/04 12:0 a.m.8 views

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

The rapid integration of Multimodal Large Language Models MLLMs into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for...

7.2AI score
Exploits0
Schneier on Security
Schneier on Security
added 2025/11/28 2:54 p.m.4 views

Prompt Injection Through Poetry

In a new paper, "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," researchers found that turning LLM prompts into poetry resulted in jailbreaking the models: Abstract : We present evidence that adversarial poetry functions as a universal single-turn...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/04 12:0 a.m.3 views

False Sense of Security: Why Probing-Based Malicious Input Detection Fails to Generalize

Large Language Models LLMs can comply with harmful instructions, raising serious safety concerns despite their impressive capabilities. Recent work has leveraged probing-based approaches to study the separability of malicious and benign inputs in LLMs' internal representations, and researchers ha...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/22 12:0 a.m.3 views

Towards Safety and Security Testing of Cyberphysical Power Systems by Shape Validation

The increasing complexity of cyberphysical power systems leads to larger attack surfaces to be exploited by malicious actors and a higher risk of faults through misconfiguration. We propose to meet those risks with a declarative approach to describe cyberphysical power systems and to automaticall...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/06 12:0 a.m.3 views

The Scales of Justitia: a Comprehensive Survey on Safety Evaluation of LLMs

With the rapid advancement of artificial intelligence technology, Large Language Models LLMs have demonstrated remarkable potential in the field of Natural Language Processing NLP, including areas such as content generation, human-computer interaction, machine translation, and code generation,...

7.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/21 12:0 a.m.2 views

Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

Rapid deployment of vision-language models VLMs magnifies safety risks, yet most evaluations rely on artificial images. This study asks: How safe are current VLMs when confronted with meme images that ordinary users share? To investigate this question, we introduce MemeSafetyBench, a...

7.1AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/04/28 12:0 a.m.2 views

SAGE: a Generic Framework for LLM Safety Evaluation

Whitepaper called SAGE: A Generic Framework For LLM Safety Evaluation...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/04/26 12:0 a.m.2 views

T2VShield: Model-Agnostic Jailbreak Defense for Text-To-Video Models

The rapid development of generative artificial intelligence has made text to video models essential for building future multimodal world simulators. However, these models remain vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of...

7.1AI score
Exploits0
Rows per page
Query Builder