659 matches found
Knowledge-To-Data: LLM-Driven Synthesis of Structured Network Traffic for Testbed-Free IDS Evaluation
Realistic, large-scale, and well-labeled cybersecurity datasets are essential for training and evaluating Intrusion Detection Systems IDS. However, they remain difficult to obtain due to privacy constraints, data sensitivity, and the cost of building controlled collection environments such as...
PT-2026-20455
Name of the Vulnerable Software and Affected Versions Linux kernel affected versions not specified Description A flaw exists in the Linux kernel's virtio crypto component related to spinlock protection when handling virtqueue notifications. Specifically, when a virtual machine boots with a single...
SoK: Understanding (New) Security Issues across AI4Code Use Cases
AI-for-Code AI4Code systems are reshaping software engineering, with tools like GitHub Copilot accelerating code generation, translation, and vulnerability detection. Alongside these advances, however, security risks remain pervasive: insecure outputs, biased benchmarks, and susceptibility to...
ai.aitia:arrowhead-application-library-java-spring (>=4.4.0.0 <=4.6.0.0), androidx.baselineprofile.apptarget:androidx.baselineprofile.apptarget.gradle.plugin (>=1.2.0-alpha12 <=1.2.0-alpha14) +2660 more potentially affected by CVE-2024-29371 via org.bitbucket.b_c:jose4j (>=0.4.1 <=0.9.5)
org.bitbucket.bc:jose4j MAVEN version =0.4.1, =4.4.0.0, =1.2.0-alpha12, =1.2.0-alpha12, =1.2.0-alpha12, =1.2.0-alpha12, =1.2.0-alpha07, =1.2.0-alpha12, =1.2.0-alpha07, =2.6.0, =2.6.0, =2.6.0, =1.0.0-alpha01, =1.0.0-alpha01,...
ai.aitia:arrowhead-application-library-java-spring (>=4.4.0.0 <=4.6.0.0), androidx.baselineprofile.apptarget:androidx.baselineprofile.apptarget.gradle.plugin (>=1.2.0-alpha12 <=1.2.0-alpha14) +2660 more potentially affected by CVE-2024-29371 via org.bitbucket.b_c:jose4j (>=0.4.1 <=0.9.5)
org.bitbucket.bc:jose4j MAVEN version =0.4.1, =4.4.0.0, =1.2.0-alpha12, =1.2.0-alpha12, =1.2.0-alpha12, =1.2.0-alpha12, =1.2.0-alpha07, =1.2.0-alpha12, =1.2.0-alpha07, =2.6.0, =2.6.0, =2.6.0, =1.0.0-alpha01, =1.0.0-alpha01,...
PentestEval: Benchmarking LLM-Based Penetration Testing with Modular and Stage-Level Design
Penetration testing is essential for assessing and strengthening system security against real-world threats, yet traditional workflows remain highly manual, expertise-intensive, and difficult to scale. Although recent advances in Large Language Models LLMs offer promising opportunities for...
AIs Exploiting Smart Contracts
I have long maintained that smart contracts are a dumb idea: that a human process is actually a security feature. Here's some interesting research on training AIs to automatically exploit smart contracts: AI models are increasingly good at cyber tasks, as we've written about before. But what is t...
TeleAI-Safety: A Comprehensive LLM Jailbreaking Benchmark Towards Attacks, Defenses, and Evaluations
While the deployment of large language models LLMs in high-value industries continues to expand, the systematic assessment of their safety against jailbreak and prompt-based attacks remains insufficient. Existing safety evaluation benchmarks and frameworks are often limited by an imbalanced...
Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification
Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has made significant progress in WebShell detection i.e., distinguishing malicious...
PBFuzz: Agentic Directed Fuzzing for PoV Generation
Proof-of-Vulnerability PoV input generation is a critical task in software security and supports downstream applications such as path generation and validation. Generating a PoV input requires solving two sets of constraints: 1 reachability constraints for reaching vulnerable code locations, and ...
Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models
Large Language Models LLMs have demonstrated exceptional performance across various tasks, but their security vulnerabilities can be exploited by attackers to generate harmful content, causing adverse impacts across various societal domains. Most existing jailbreak methods revolve around Prompt...
Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
Vibe coding is a new programming paradigm in which human engineers instruct large language model LLM agents to complete complex coding tasks with little supervision. Although it is increasingly adopted, are vibe coding outputs really safe to deploy in production? To answer this question, we propo...
BackportBench: A Multilingual Benchmark for Automated Backporting of Patches
Many modern software projects evolve rapidly to incorporate new features and security patches. It is important for users to update their dependencies to safer versions, but many still use older, vulnerable package versions because upgrading can be difficult and may break their existing codebase...
Large Language Models Cannot Reliably Detect Vulnerabilities in JavaScript: The First Systematic Benchmark and Evaluation
Researchers have proposed numerous methods to detect vulnerabilities in JavaScript, especially those assisted by Large Language Models LLMs. However, the actual capability of LLMs in JavaScript vulnerability detection remains questionable, necessitating systematic evaluation and comprehensive...
GAPS: Guiding Dynamic Android Analysis with Static Path Synthesis
Dynamically resolving method reachability in Android applications remains a critical and largely unsolved problem. Despite notable advancements in GUI testing and static call graph construction, current tools are insufficient for reliably driving execution toward specific target methods, especial...
Clustering Malware at Scale: A First Full-Benchmark Study
Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their trustworthiness or prove their maliciousness. One of the ways in which groups of malware samples can be identified is through malware...
BrowseSafe: Understanding and Preventing Prompt Injection within AI Browser Agents
The integration of artificial intelligence AI agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injection as a new attack vector for web agents, yet the resulting impact within real-world environments...
DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation
Large language models LLMs and autonomous coding agents are increasingly used to generate software across a wide range of domains. Yet a core requirement remains unmet: ensuring that generated code is secure without compromising its functional correctness. Existing benchmarks and evaluations for...
Building Browser Agents: Architecture, Security, and Practical Solutions
Browser agents enable autonomous web interaction but face critical reliability and security challenges in production. This paper presents findings from building and operating a production browser agent. The analysis examines where current approaches fail and what prevents safe autonomous operatio...
ThreadFuzzer: Fuzzing Framework for Thread Protocol
With the rapid growth of IoT, secure and efficient mesh networking has become essential. Thread has emerged as a key protocol, widely used in smart-home and commercial systems, and serving as a core transport layer in the Matter standard. This paper presents ThreadFuzzer, the first dedicated...