26 matches found
Security Is Relative: Training-Free Vulnerability Detection Via Multi-Agent Behavioral Contract Synthesis
Deep learning for vulnerability detection has shown promising results on early benchmarks, but recent evaluations reveal catastrophic degradation: models achieving F1 0.68 on legacy datasets collapse to 0.031 under strict deduplication. We identify the root cause as the semantic ambiguity problem...
Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs
Large Language ModelsLLMs are widely deployed, yet are vulnerable to jailbreak prompts that elicit policy-violating outputs. Although prior studies have uncovered these risks, they typically treat all tokens as equally important during prompt mutation, overlooking the varying contributions of...
Analysis of LLMs against Prompt Injection and Jailbreak Attacks
Large Language Models LLMs are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus,...
Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models
Jailbreaking large language models LLMs has emerged as a critical security challenge with the widespread deployment of conversational AI systems. Adversarial users exploit these models through carefully crafted prompts to elicit restricted or unsafe outputs, a phenomenon commonly referred to as...
TrojanPraise: Jailbreak LLMs Via Benign Fine-Tuning
The demand of customized large language models LLMs has led to commercial LLMs offering black-box fine-tuning APIs, yet this convenience introduces a critical security loophole: attackers could jailbreak the LLMs by fine-tuning them with malicious data. Though this security issue has recently bee...
Blue Teaming Function-Calling Agents
We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the...
Emoji-Based Jailbreaking of Large Language Models
Large Language Models LLMs are integral to modern AI applications, but their safety alignment mechanisms can be bypassed through adversarial prompt engineering. This study investigates emoji-based jailbreaking, where emoji sequences are embedded in textual prompts to trigger harmful and unethical...
How Can We Effectively Use LLMs for Phishing Detection?: Evaluating the Effectiveness of Large Language Model-Based Phishing Detection Models
Large language models LLMs have emerged as a promising phishing detection mechanism, addressing the limitations of traditional deep learning-based detectors, including poor generalization to previously unseen websites and a lack of interpretability. However, LLMs' effectiveness for phishing...
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Large Language Models LLMs have demonstrated strong capabilities in natural language reasoning, yet their application to Cyber Threat Intelligence CTI remains limited. CTI analysis involves distilling large volumes of unstructured reports into actionable knowledge, a process where LLMs could...
On Selecting Few-Shot Examples for LLM-Based Code Vulnerability Detection
Large language models LLMs have demonstrated impressive capabilities for many coding tasks, including summarization, translation, completion, and code generation. However, detecting code vulnerabilities remains a challenging task for LLMs. An effective way to improve LLM performance is in-context...
Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses
Multimodal large language models MLLMs comprise of both visual and textual modalities to process vision language tasks. However, MLLMs are vulnerable to security-related issues, such as jailbreak attacks that alter the model's input to induce unauthorized or harmful responses. The incorporation o...
Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks
Multimodal large language models MLLMs have demonstrated significant utility across diverse real-world applications. But MLLMs remain vulnerable to jailbreaks, where adversarial inputs can collapse their safety constraints and trigger unethical responses. In this work, we investigate jailbreaks i...
When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking
The remarkable capabilities of Large Language Models LLMs in natural language understanding and generation have sparked interest in their potential for cybersecurity applications, including password guessing. In this study, we conduct an empirical investigation into the efficacy of pre-trained LL...
SASER: Stego Attacks on Open-Source LLMs
Open-source large language models LLMs have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks to the collaborative and sharing nature. Although full access to source codes, model parameters, and training data lays the groundwork for transparenc...
ArtPerception: ASCII Art-Based Jailbreak on LLMs with Recognition Pre-Test
The integration of Large Language Models LLMs into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data...
Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism Via Probabilistically Ablating Refusal Direction
Jailbreak attacks pose persistent threats to large language models LLMs. Current safety alignment methods have attempted to address these issues, but they experience two significant limitations: insufficient safety alignment depth and unrobust internal defense mechanisms. These limitations make...
Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs
Large language models LLMs have undergone safety alignment efforts to mitigate harmful outputs. However, as LLMs become more sophisticated in reasoning, their intelligence may introduce new security risks. While traditional jailbreak attacks relied on singlestep attacks, multi-turn jailbreak...
Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes
The advancement of AI technologies, particularly Large Language Models LLMs, has transformed computing while introducing new security and privacy risks. Prior research shows that cybercriminals are increasingly leveraging uncensored LLMs ULLMs as backends for malicious services. Understanding the...
Mitigating Jailbreaks with Intent-Aware LLMs
Despite extensive safety-tuning, large language models LLMs remain vulnerable to jailbreak attacks via adversarially crafted instructions, reflecting a persistent trade-off between safety and task performance. In this work, we propose Intent-FT, a simple and lightweight fine-tuning approach that...
PentestJudge: Judging Agent Behavior against Operational Requirements
We introduce PentestJudge, a system for evaluating the operations of penetration testing agents. PentestJudge is a large language model LLM-as-judge with access to tools that allow it to consume arbitrary trajectories of agent states and tool call history to determine whether a security agent's...