929 matches found
Understanding Concept Drift with Deprecated Permissions in Android Malware Detection
Permission analysis is a widely used method for Android malware detection. It involves examining the permissions requested by an application to access sensitive data or perform potentially malicious actions. In recent years, various machine learning ML algorithms have been applied to Android...
Out of Distribution, out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25 CWE Weaknesses?
Automated vulnerability detection research has made substantial progress, yet its real-world impact remains limited. Current vulnerability datasets suffer from issues including label inaccuracy rates of 20-71%, extensive duplication, and poor coverage of critical CWE types. These issues create a...
SUSE CVE-2024-47187
Suricata is a network Intrusion Detection System, Intrusion Prevention System and Network Security Monitoring engine. Prior to version 7.0.7, missing initialization of the random seed for "thash" leads to datasets having predictable hash table behavior. This can lead to dataset file loading to us...
HumanSAM: Classifying Human-Centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
Numerous synthesized videos from generative models, especially human-centric ones that simulate realistic human actions, pose significant threats to human information security and authenticity. While progress has been made in binary forgery video detection, the lack of fine-grained understanding ...
Tab-MIA: a Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
Large language models LLMs are increasingly trained on tabular data, which, unlike unstructured text, often contains personally identifiable information PII in a highly structured and explicit format. As a result, privacy risks arise, since sensitive records can be inadvertently retained by the...
Talking like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers
Voice phishing vishing remains a persistent threat in cybersecurity, exploiting human trust through persuasive speech. While machine learning ML-based classifiers have shown promise in detecting malicious call transcripts, they remain vulnerable to adversarial manipulations that preserve semantic...
FaultLine: Automated Proof-Of-Vulnerability Generation Using LLM Agents
Despite the critical threat posed by software security vulnerabilities, reports are often incomplete, lacking the proof-of-vulnerability PoV tests needed to validate fixes and prevent regressions. These tests are crucial not only for ensuring patches work, but also for helping developers understa...
A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning
Federated learning FL enables collaborative model training across decentralized clients while preserving data privacy. However, its open-participation nature exposes it to data-poisoning attacks, in which malicious actors submit corrupted model updates to degrade the global model. Existing defens...
An Investigation of Ear-EEG Signals for a Novel Biometric Authentication System
This work explores the feasibility of biometric authentication using EEG signals acquired through in-ear devices, commonly referred to as ear-EEG. Traditional EEG-based biometric systems, while secure, often suffer from low usability due to cumbersome scalp-based electrode setups. In this study, ...
Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models
Automated Program Repair APR is essential for ensuring software reliability and quality while enhancing efficiency and reducing developers' workload. Although rule-based and learning-based APR methods have demonstrated their effectiveness, their performance was constrained by the defect type of...
The Man behind the Sound: Demystifying Audio Private Attribute Profiling Via Multimodal Large Language Model Agents
Our research uncovers a novel privacy risk associated with multimodal large language models MLLMs: the ability to infer sensitive personal attributes from audio data -- a technique we term audio private attribute profiling. This capability poses a significant threat, as audio can be covertly...
PhreshPhish: a Real-World, High-Quality, Large-Scale Phishing Website Dataset and Benchmark
Phishing remains a pervasive and growing threat, inflicting heavy economic and reputational damage. While machine learning has been effective in real-time detection of phishing attacks, progress is hindered by lack of large, high-quality datasets and benchmarks. In addition to poor-quality due to...
REAL-IoT: Characterizing GNN Intrusion Detection Robustness under Practical Adversarial Attack
Graph Neural Network GNN-based network intrusion detection systems NIDS are often evaluated on single datasets, limiting their ability to generalize under distribution drift. Furthermore, their adversarial robustness is typically assessed using synthetic perturbations that lack realism. This...
Exploit for SQL Injection in Phpgurukul Student_Result_Management_System
PoCVulDb PoC of CVEs 4m3rr0r CVE-2025-7534https://github.c...
DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective
The widespread application of Deep Learning across diverse domains hinges critically on the quality and composition of training datasets. However, the common lack of disclosure regarding their usage raises significant privacy and copyright concerns. Dataset auditing techniques, which aim to...
MalVol-25: a Diverse, Labelled and Detailed Volatile Memory Dataset for Malware Detection and Response Testing and Validation
This paper addresses the critical need for high-quality malware datasets that support advanced analysis techniques, particularly machine learning and agentic AI frameworks. Existing datasets often lack diversity, comprehensive labelling, and the complexity necessary for effective machine learning...
ZKPROV: a Zero-Knowledge Approach to Dataset Provenance for Large Language Models
As the deployment of large language models LLMs grows in sensitive domains, ensuring the integrity of their computational provenance becomes a critical challenge, particularly in regulated sectors such as healthcare, where strict requirements are applied in dataset usage. We introduce ZKPROV, a...
Intelligent ARP Spoofing Detection Using Multi-Layered Machine Learning (ML) Techniques for IoT Networks
Address Resolution Protocol ARP spoofing remains a critical threat to IoT networks, enabling attackers to intercept, modify, or disrupt data transmission by exploiting ARP's lack of authentication. The decentralized and resource-constrained nature of IoT environments amplifies this vulnerability,...
HARPT: a Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps
We present HARPT, a large-scale annotated corpus of mobile health app store reviews aimed at advancing research in user privacy and trust. The dataset comprises over 480,000 user reviews labeled into seven categories that capture critical aspects of trust in applications, trust in providers and...
Optimizing Resource Allocation and Energy Efficiency in Federated Fog Computing for IoT
Address Resolution Protocol ARP spoofing attacks severely threaten Internet of Things IoT networks by allowing attackers to intercept, modify, or block communications. Traditional detection methods are insufficient due to high false positives and poor adaptability. This research proposes a...