Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks Via Reinforcement Learning
Large language models LLMs have demonstrated remarkable capabilities across diverse applications, however, they remain critically vulnerable to jailbreak attacks that elicit harmful responses violating human values and safety guidelines. Despite extensive research on defense mechanisms, existing...