13 matches found
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: behavior jailbreak, where an adversary induces an agent to execute dangerous OS-level operations with irreversible consequences. Existing...
Automation-Exploit-Legacy
Automation-Exploit Legacy Prototype This repository contain...
MARD: A Multi-Agent Framework for Robust Android Malware Detection
With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models LLMs...
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection
Open-source libraries are widely used in modern software development, introducing significant security vulnerabilities. While static analysis tools can identify potential vulnerabilities at scale, they often generate overwhelming reports with high false positive rates. Automated Exploit Generatio...
AEGIS: From Clues to Verdicts -- Graph-Guided Deep Vulnerability Reasoning Via Dialectics and Meta-Auditing
Large Language Models LLMs are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a root cause shared by both major mitigation paradigms agent-based debate and retrieval augmentation: reasoning in an ungrounded deliberative space that...
SCAFFOLD-CEGIS: Preventing Latent Security Degradation in LLM-Driven Iterative Code Refinement
The application of large language models to code generation has evolved from one-shot generation to iterative refinement, yet the evolution of security throughout iteration remains insufficiently understood. Through comparative experiments on three mainstream LLMs, this paper reveals the iterativ...
AXE: An Agentic EXploit Engine for Confirming Zero-Day Vulnerability Reports
Vulnerability detection tools are widely adopted in software projects, yet they often overwhelm maintainers with false positives and non-actionable reports. Automated exploitation systems can help validate these reports; however, existing approaches typically operate in isolation from detection...
CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment
Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty, and operate without guaranteed success. Existing LLM-based offensive agent evaluations rely on closed-world settings with predefined goals and binary...
Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework
The rapid expansion of low-altitude economy Internet of Things LAE-IoT networks has created unprecedented security challenges due to dynamic three-dimensional mobility patterns, distributed autonomous operations, and severe resource constraints. Traditional intrusion detection systems designed fo...
Exploring Traffic Simulation and Cybersecurity Strategies Using Large Language Models
Intelligent Transportation Systems ITS are increasingly vulnerable to sophisticated cyberattacks due to their complex, interconnected nature. Ensuring the cybersecurity of these systems is paramount to maintaining road safety and minimizing traffic disruptions. This study presents a novel...
AURA: a Multi-Agent Intelligence Framework for Knowledge-Enhanced Cyber Threat Attribution
Effective attribution of Advanced Persistent Threats APTs increasingly hinges on the ability to correlate behavioral patterns and reason over complex, varied threat intelligence artifacts. We present AURA Attribution Using Retrieval-Augmented Agents, a multi-agent, knowledge-enhanced framework fo...
MalGEN: a Generative Agent Framework for Modeling Malicious Software in Cybersecurity
The dual use nature of Large Language Models LLMs presents a growing challenge in cybersecurity. While LLM enhances automation and reasoning for defenders, they also introduce new risks, particularly their potential to be misused for generating evasive, AI crafted malware. Despite this emerging...
ThreatLens: LLM-Guided Threat Modeling and Test Plan Generation for Hardware Security Verification
Current hardware security verification processes predominantly rely on manual threat modeling and test plan generation, which are labor-intensive, error-prone, and struggle to scale with increasing design complexity and evolving attack methodologies. To address these challenges, we propose...