Jailbreak Distillation: Renewable Safety Benchmarking
Large language models LLMs are rapidly deployed in critical applications, raising urgent needs for robust safety benchmarking. We propose Jailbreak Distillation JBDistill, a novel benchmark construction framework that "distills" jailbreak attacks into high-quality and easily-updatable safety...