CAIN: Hijacking LLM-Humans Conversations Via a Two-Stage Malicious System Prompt Generation and Refining Framework
Large language models LLMs have advanced many applications, but are also known to be vulnerable to adversarial attacks. In this work, we introduce a novel security threat: hijacking AI-human conversations by manipulating LLMs' system prompts to produce malicious answers only to specific targeted...