Lucene search
K

4 matches found

Packet Storm News
Packet Storm News
added 2026/05/27 12:0 a.m.5 views

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that...

5.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.3 views

Threat Modelling Using Domain-Adapted Language Models: Empirical Evaluation and Insights

Large Language ModelsLLMs are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat modelling, prior work has primarily evaluated a number of general-purpose Large Language Models under limited prompting settings. In this study, we extend th...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/04 12:0 a.m.3 views

Bypassing AI Control Protocols Via Agent-As-A-Proxy Attacks

As AI agents automate critical workloads, they remain vulnerable to indirect prompt injection IPI attacks. Current defenses rely on monitoring protocols that jointly evaluate an agent's Chain-of-Thought CoT and tool-use actions to ensure alignment with user intent. We demonstrate that these...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/25 12:0 a.m.2 views

Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA

Memorization in large language models LLMs makes them vulnerable to data extraction attacks. While pre-training memorization has been extensively studied, fewer works have explored its impact in fine-tuning, particularly for LoRA fine-tuning, a widely adopted parameter-efficient method. In this...

6.9AI score
Exploits0
Rows per page
Query Builder