Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments
Large language models LLMs increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities scale when models interact adversarially. This study examines whether larger models can systematically jailbreak smaller ones - eliciting harmful or...