2 matches found
Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review
Security code reviews increasingly rely on systems integrating Large Language Models LLMs, ranging from interactive assistants to autonomous agents in CI/CD pipelines. We study whether confirmation bias i.e., the tendency to favor interpretations that align with prior expectations affects LLM-bas...
Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models
Large language models LLMs remain vulnerable to sophisticated prompt engineering attacks that exploit contextual framing to bypass safety mechanisms, posing significant risks in cybersecurity applications. We introduce Jailbreak Mimicry, a systematic methodology for training compact attacker mode...