2 matches found
NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks
Jailbreak attacks designed to bypass safety mechanisms pose a serious threat by prompting LLMs to generate harmful or inappropriate content, despite alignment with ethical guidelines. Crafting universal filtering rules remains difficult due to their inherent dependence on specific contexts. To...
Can ChatGPT Perform Image Splicing Detection? A Preliminary Study
Multimodal Large Language Models MLLMs like GPT-4V are capable of reasoning across text and image modalities, showing promise in a variety of complex vision-language tasks. In this preliminary study, we investigate the out-of-the-box capabilities of GPT-4V in the domain of image forensics,...