6 matches found
TrojanPraise: Jailbreak LLMs Via Benign Fine-Tuning
The demand of customized large language models LLMs has led to commercial LLMs offering black-box fine-tuning APIs, yet this convenience introduces a critical security loophole: attackers could jailbreak the LLMs by fine-tuning them with malicious data. Though this security issue has recently bee...
VEIL: Jailbreaking Text-To-Video Models Via Visual Exploitation from Implicit Language
Jailbreak attacks can circumvent model safety guardrails and reveal critical blind spots. Prior attacks on text-to-video T2V models typically add adversarial perturbations to obviously unsafe prompts, which are often easy to detect and defend. In contrast, we show that benign-looking prompts...
How Can We Effectively Use LLMs for Phishing Detection?: Evaluating the Effectiveness of Large Language Model-Based Phishing Detection Models
Large language models LLMs have emerged as a promising phishing detection mechanism, addressing the limitations of traditional deep learning-based detectors, including poor generalization to previously unseen websites and a lack of interpretability. However, LLMs' effectiveness for phishing...
ArtPerception: ASCII Art-Based Jailbreak on LLMs with Recognition Pre-Test
The integration of Large Language Models LLMs into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data...
Talking like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers
Voice phishing vishing remains a persistent threat in cybersecurity, exploiting human trust through persuasive speech. While machine learning ML-based classifiers have shown promise in detecting malicious call transcripts, they remain vulnerable to adversarial manipulations that preserve semantic...
CAIN: Hijacking LLM-Humans Conversations Via a Two-Stage Malicious System Prompt Generation and Refining Framework
Large language models LLMs have advanced many applications, but are also known to be vulnerable to adversarial attacks. In this work, we introduce a novel security threat: hijacking AI-human conversations by manipulating LLMs' system prompts to produce malicious answers only to specific targeted...