647 matches found
Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
Vibe coding is a new programming paradigm in which human engineers instruct large language model LLM agents to complete complex coding tasks with little supervision. Although it is increasingly adopted, are vibe coding outputs really safe to deploy in production? To answer this question, we propo...
BackportBench: A Multilingual Benchmark for Automated Backporting of Patches
Many modern software projects evolve rapidly to incorporate new features and security patches. It is important for users to update their dependencies to safer versions, but many still use older, vulnerable package versions because upgrading can be difficult and may break their existing codebase...
Large Language Models Cannot Reliably Detect Vulnerabilities in JavaScript: The First Systematic Benchmark and Evaluation
Researchers have proposed numerous methods to detect vulnerabilities in JavaScript, especially those assisted by Large Language Models LLMs. However, the actual capability of LLMs in JavaScript vulnerability detection remains questionable, necessitating systematic evaluation and comprehensive...
Clustering Malware at Scale: A First Full-Benchmark Study
Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their trustworthiness or prove their maliciousness. One of the ways in which groups of malware samples can be identified is through malware...
GAPS: Guiding Dynamic Android Analysis with Static Path Synthesis
Dynamically resolving method reachability in Android applications remains a critical and largely unsolved problem. Despite notable advancements in GUI testing and static call graph construction, current tools are insufficient for reliably driving execution toward specific target methods, especial...
BrowseSafe: Understanding and Preventing Prompt Injection within AI Browser Agents
The integration of artificial intelligence AI agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injection as a new attack vector for web agents, yet the resulting impact within real-world environments...
DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation
Large language models LLMs and autonomous coding agents are increasingly used to generate software across a wide range of domains. Yet a core requirement remains unmet: ensuring that generated code is secure without compromising its functional correctness. Existing benchmarks and evaluations for...
Building Browser Agents: Architecture, Security, and Practical Solutions
Browser agents enable autonomous web interaction but face critical reliability and security challenges in production. This paper presents findings from building and operating a production browser agent. The analysis examines where current approaches fail and what prevents safe autonomous operatio...
ThreadFuzzer: Fuzzing Framework for Thread Protocol
With the rapid growth of IoT, secure and efficient mesh networking has become essential. Thread has emerged as a key protocol, widely used in smart-home and commercial systems, and serving as a core transport layer in the Matter standard. This paper presents ThreadFuzzer, the first dedicated...
Securing AI Agents against Prompt Injection Attacks
Retrieval-augmented generation RAG systems have become widely used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection attacks. We present a comprehensive benchmark for evaluating prompt injection risks in RAG-enabled A...
Can MLLMs Detect Phishing? A Comprehensive Security Benchmark Suite Focusing on Dynamic Threats and Multimodal Evaluation in Academic Environments
The rapid proliferation of Multimodal Large Language Models MLLMs has introduced unprecedented security challenges, particularly in phishing detection within academic environments. Academic institutions and researchers are high-value targets, facing dynamic, multilingual, and context-dependent...
LFreeDA: Label-Free Drift Adaptation for Windows Malware Detection
Machine learning ML-based malware detectors degrade over time as concept drift introduces new and evolving families unseen during training. Retraining is limited by the cost and time of manual labeling or sandbox analysis. Existing approaches mitigate this via drift detection and selective...
Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security
As the use of large language models LLMs continues to expand, ensuring their safety and robustness has become a critical challenge. In particular, jailbreak attacks that bypass built-in safety mechanisms are increasingly recognized as a tangible threat across industries, driving the need for...
Adaptive Dual-Layer Web Application Firewall (ADL-WAF) Leveraging Machine Learning for Enhanced Anomaly and Threat Detection
Web Application Firewalls are crucial for protecting web applications against a wide range of cyber threats. Traditional Web Application Firewalls often struggle to effectively distinguish between malicious and legitimate traffic, leading to limited efficacy in threat detection. To overcome these...
PATCHEVAL: A New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities
Software vulnerabilities are increasing at an alarming rate. However, manual patching is both time-consuming and resource-intensive, while existing automated vulnerability repair AVR techniques remain limited in effectiveness. Recent advances in large language models LLMs have opened a new paradi...
EUVD-2025-178450
Malicious code in import-benchmark-warn-node-catch npm...
EUVD-2025-175867
Malicious code in try-benchmark-assert-module-protected npm...
MAL-2025-189692 Malicious code in string-compile-module-benchmark-report (npm)
--- -= Per source details. Do not edit below this line.=- Source: amazon-inspector 4a32edf99d2adb837e219033e596267b024daa93bbe2f9e9e2030e20bfffdffd This package appears to be part of the tea.xyz token reward campaign that flooded npm. These packages typically contain autopublish scripts auto.js,...
EUVD-2025-180005
Malicious code in boolean-double-benchmark-star-node npm...
EUVD-2025-179252
Malicious code in double-benchmark-pipe-hash-virtualize npm...