15 matches found
Involuntary In-Context Learning: Exploiting Few-Shot Pattern Completion to Bypass Safety Alignment in GPT-5.4
Safety alignment in large language models relies on behavioral training that can be overridden when sufficiently strong in-context patterns compete with learned refusal behaviors. We introduce Involuntary In-Context Learning IICL, an attack class that uses abstract operator framing with few-shot...
Hallucination-Resistant Security Planning with a Large Language Model
Large language models LLMs are promising tools for supporting security management tasks, such as incident response planning. However, their unreliability and tendency to hallucinate remain significant challenges. In this paper, we address these challenges by introducing a principled framework for...
DrAttack
DrAttack: Prompt Decomposition and Reconstruction Makes Powerf...
On Selecting Few-Shot Examples for LLM-Based Code Vulnerability Detection
Large language models LLMs have demonstrated impressive capabilities for many coding tasks, including summarization, translation, completion, and code generation. However, detecting code vulnerabilities remains a challenging task for LLMs. An effective way to improve LLM performance is in-context...
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Large language model LLM agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools. However, this tool-centric paradigm introduces a previously underexplored attack surface: adversaries can manipulate tool metadata -- such as names,...
In-Context Learning of Vision Language Models for Detection of Physical and Digital Attacks against Face Recognition Systems
Recent advances in biometric systems have significantly improved the detection and prevention of fraudulent activities. However, as detection methods improve, attack techniques become increasingly sophisticated. Attacks on face recognition systems can be broadly divided into physical and digital...
Attention Slipping: a Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
As large language models LLMs become more integral to society and technology, ensuring their safety becomes essential. Jailbreak attacks exploit vulnerabilities to bypass safety guardrails, posing a significant threat. However, the mechanisms enabling these attacks are not well understood. In thi...
SV-LLM: an Agentic Approach for SoC Security Verification Using Large Language Models
Ensuring the security of complex system-on-chips SoCs designs is a critical imperative, yet traditional verification techniques struggle to keep pace due to significant challenges in automation, scalability, comprehensiveness, and adaptability. The advent of large language models LLMs, with their...
Can In-Context Reinforcement Learning Recover from Reward Poisoning Attacks?
We study the corruption-robustness of in-context reinforcement learning ICRL, focusing on the Decision-Pretrained Transformer DPT, Lee et al., 2023. To address the challenge of reward poisoning attacks targeting the DPT, we propose a novel adversarial training framework, called Adversarially...
On Automating Security Policies with Contemporary LLMs
The complexity of modern computing environments and the growing sophistication of cyber threats necessitate a more robust, adaptive, and automated approach to security enforcement. In this paper, we present a framework leveraging large language models LLMs for automating attack mitigation policy...
Hijacking Large Language Models Via Adversarial In-Context Learning
In-context learning ICL has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations demos in the preconditioned prompts. Despite its promising performance, crafted adversarial attacks pose a notable threat to the robustness of...
VulBinLLM: LLM-Powered Vulnerability Detection for Stripped Binaries
Recognizing vulnerabilities in stripped binary files presents a significant challenge in software security. Although some progress has been made in generating human-readable information from decompiled binary files with Large Language Models LLMs, effectively and scalably detecting vulnerabilitie...
PIG: Privacy Jailbreak Attack on LLMs Via Gradient-Based Iterative In-Context Optimization
Large Language Models LLMs excel in various domains but pose inherent privacy risks. Existing methods to evaluate privacy leakage in LLMs often use memorized prefixes or simple instructions to extract data, both of which well-alignment models can easily block. Meanwhile, Jailbreak attacks bypass...
LASHED: LLMs and Static Hardware Analysis for Early Detection of RTL Bugs
While static analysis is useful in detecting early-stage hardware security bugs, its efficacy is limited because it requires information to form checks and is often unable to explain the security impact of a detected vulnerability. Large Language Models can be useful in filling these gaps by...
How Private Is Your Attention? Bridging Privacy with In-Context Learning
In-context learning ICL-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. While recent works have investigated the mechanisms underlying ICL, its feasibility under formal privacy constraints...