115 matches found
Patcher: Post-Hoc Patching of Backdoored Large Language Models
Large language models remain vulnerable to jailbreak backdoor attacks, where adversaries poison safety alignment data to embed hidden triggers that bypass safety mechanisms. Existing defenses often require comprehensive attack information or multiple triggered examples, making them impractical wh...
Detecting Trojaned DNNs Via Spectral Regression Analysis
Modern DNNs are repeatedly fine-tuned to incorporate new data and functionality. This evolutionary workflow introduces a security risk when updated data cannot be fully trusted, as adversaries may implant Trojans during fine-tuning. We present MIST, a Trojan detection approach that analyzes how a...
Backdooring Masked Diffusion Language Models
Masked diffusion language models MDLMs are emerging as a compelling new paradigm for text generation, but their training-time security remains largely unexplored. Existing backdoor attacks on Gaussian diffusion models or autoregressive language models do not directly apply to MDLMs because MDLMs...
On Fixing Insecure AI-Generated Code through Model Fine-Tuning and Prompting Strategies
The security of AI-generated code remains a major obstacle to its widespread adoption. Although code generation models achieve strong performance on functional benchmarks, their outputs frequently contain bugs and security weaknesses that undermine their trustworthiness. Prior work has explored a...
How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection
How code representation format shapes false positive behaviour in cross-language LLM vulnerability detection remains poorly understood. We systematically vary training intensity and code representation format, comparing raw source text with pruned Abstract Syntax Trees at both training time and...
Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors
Local fine-tuning datasets routinely contain sensitive secrets such as API keys, personal identifiers, and financial records. Although ''local offline fine-tuning'' is often viewed as a privacy boundary, we reveal that compromised model code is sufficient to steal them. Current passive...
XekRung Technical Report
We present XekRung, a frontier large language model for cybersecurity, designed to provide comprehensive security capabilities. To achieve this, we develop diverse data synthesis pipelines tailored to the cybersecurity domain, enabling the scalable construction of high-quality training data and...
MARD: A Multi-Agent Framework for Robust Android Malware Detection
With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models LLMs...
Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection
Cross-site scripting XSS remains a persistent web security vulnerability, especially because obfuscation can change the surface form of a malicious payload while preserving its behavior. These transformations make it difficult for traditional and machine learning-based detection systems to reliab...
DP-FlogTinyLLM: Differentially Private Federated Log Anomaly Detection Using Tiny LLMs
Modern distributed systems generate massive volumes of log data that are critical for detecting anomalies and cyber threats. However, in real world settings, these logs are often distributed across multiple organizations and cannot be centralized due to privacy and security constraints. Existing...
ARES: Adaptive Red-Teaming and End-To-End Repair of Policy-Reward System
Reinforcement Learning from Human Feedback RLHF is central to aligning Large Language Models LLMs, yet it introduces a critical vulnerability: an imperfect Reward Model RM can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches...
LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering
Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models LLMs have shown promise in translating low-level representations into high-level...
An Agentic Multi-Agent Architecture for Cybersecurity Risk Management
Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes weeks, and depends on practitioners who are genuinely scarce. Most small companies skip it entirely. We built a six-agent AI system where each age...
TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models
The deployment of Large Reasoning Models LRMs in high-stakes decision-making pipelines has introduced a novel and opaque attack surface: reasoning backdoors. In these attacks, the model's intermediate Chain-of-Thought CoT is manipulated to provide a linguistically plausible but logically fallacio...
Detecting PowerShell-Based Fileless Cryptojacking Attacks Using Machine Learning
With the emergence of remote code execution RCE vulnerabilities in ubiquitous libraries and advanced social engineering techniques, threat actors have started conducting widespread fileless cryptojacking attacks. These attacks have become effective with stealthy techniques based on PowerShell-bas...
MultiVer: Zero-Shot Multi-Agent Vulnerability Detection
We present MultiVer, a zero-shot multi-agent system for vulnerability detection that achieves state-of-the-art recall without fine-tuning. A four-agent ensemble security, correctness, performance, style with union voting achieves 82.7% recall on PyVul, exceeding fine-tuned GPT-3.5 81.3% by 1.4...
From SFT to RL: Demystifying the Post-Training Pipeline for LLM-Based Vulnerability Detection
The integration of LLMs into vulnerability detection VD has shifted the field toward interpretable and context-aware analysis. While post-training methods have shown promise in general coding tasks, their systematic application to VD remains underexplored. In this paper, we present the first...
GoodVibe: Security-By-Vibe for LLM-Based Code Generation
Large language models LLMs are increasingly used for code generation in fast, informal development workflows, often referred to as vibe coding, where speed and convenience are prioritized, and security requirements are rarely made explicit. In this setting, models frequently produce functionally...
A one-prompt attack that breaks LLM safety alignment
Large language models LLMs and diffusion models now power a wide range of applications, from document assistance to text-to-image generation, and users increasingly expect these systems to be safety-aligned by default. Yet safety alignment is only as robust as its weakest failure mode. Despite...
The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers
Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language models. Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning dat...