4 matches found
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: behavior jailbreak, where an adversary induces an agent to execute dangerous OS-level operations with irreversible consequences. Existing...
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Large Language Models LLMs have rapidly become integral to real-world applications, powering services across diverse sectors. However, their widespread deployment has exposed critical security risks, particularly through jailbreak prompts that can bypass model alignment and induce harmful outputs...
VulnRepairEval: an Exploit-Based Evaluation Framework for Assessing Large Language Model Vulnerability Repair Capabilities
The adoption of Large Language Models LLMs for automated software vulnerability patching has shown promising outcomes on carefully curated evaluation sets. Nevertheless, existing datasets predominantly rely on superficial validation methods rather than exploit-based verification, leading to...
Jailbreak Distillation: Renewable Safety Benchmarking
Large language models LLMs are rapidly deployed in critical applications, raising urgent needs for robust safety benchmarking. We propose Jailbreak Distillation JBDistill, a novel benchmark construction framework that "distills" jailbreak attacks into high-quality and easily-updatable safety...