Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models
This paper introduces Jailbreak-Zero, a novel red teaming methodology that shifts the paradigm of Large Language Model LLM safety evaluation from a constrained example-based approach to a more expansive and effective policy-based framework. By leveraging an attack LLM to generate a high volume of...