3 matches found
Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations
In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that...
Can Developers Rely on LLMs for Secure IaC Development?
We investigated the capabilities of GPT-4o and Gemini 2.0 Flash for secure Infrastructure as Code IaC development. For security smell detection, on the Stack Overflow dataset, which primarily contains small, simplified code snippets, the models detected at least 71% of security smells when prompt...
New Exam Security Questions in the AI Era: Comparing AI-Generated Item Similarity between Naive and Detail-Guided Prompting Approaches
Large language models LLMs have emerged as powerful tools for generating domain-specific multiple-choice questions MCQs, offering efficiency gains for certification boards but raising new concerns about examination security. This study investigated whether LLM-generated items created with...