Lucene search
K

5 matches found

Packet Storm News
Packet Storm News
added 2026/05/28 12:0 a.m.11 views

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

Large language models LLMs can detect software vulnerabilities, but how do they actually identify vulnerable code? We address this question using mechanistic interpretability; analyzing the internal computations of a neural network to understand its reasoning process.Using Circuit Tracer on...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/18 12:0 a.m.11 views

Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling

Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/11 12:0 a.m.9 views

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

Large language models remain vulnerable to jailbreak attacks -- inputs designed to bypass safety mechanisms and elicit harmful responses -- despite advances in alignment and instruction tuning. We propose Head-Masked Nullspace Steering HMNS, a circuit-level intervention that i identifies attentio...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/02 12:0 a.m.4 views

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language models. Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning dat...

5.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/22 12:0 a.m.3 views

Mechanistic Interpretability in the Presence of Architectural Obfuscation

Architectural obfuscation - e.g., permuting hidden-state tensors, linearly transforming embedding tables, or remapping tokens - has recently gained traction as a lightweight substitute for heavyweight cryptography in privacy-preserving large-language-model LLM inference. While recent work has sho...

6.9AI score
Exploits0
Rows per page
Query Builder