3 matches found
Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs
Large Language ModelsLLMs are widely deployed, yet are vulnerable to jailbreak prompts that elicit policy-violating outputs. Although prior studies have uncovered these risks, they typically treat all tokens as equally important during prompt mutation, overlooking the varying contributions of...
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs Via Stable Safety-Critical Expert Identification
Large language models based on Mixture-of-Experts have achieved substantial gains in efficiency and scalability, yet their architectural uniqueness introduces underexplored safety alignment challenges. Existing safety alignment strategies, predominantly designed for dense models, are ill-suited t...
AiXamine: Simplified LLM Safety and Security
Evaluating Large Language Models LLMs for safety and security remains a complex task, often requiring users to navigate a fragmented landscape of ad hoc benchmarks, datasets, metrics, and reporting formats. To address this challenge, we present aiXamine, a comprehensive black-box evaluation...