34 matches found
Bridging the Smart City Cybersecurity Data Gap through AI-Driven Synthetic Dataset Generation
Smart cities rely on interconnected cyber-physical systems that integrate sensors, IoT devices, cloud platforms, and AI-driven services and decision-making. While these systems enhance city services, they also introduce complex cybersecurity challenges due to their large attack surfaces,...
Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection
Stepping-stone intrusions SSIs are a prevalent network evasion technique in which attackers route sessions through chains of compromised intermediate hosts to obscure their origin. Effective SSI detection requires correlating the incoming and outgoing flows at each relay host at extremely low fal...
Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection
SMS-based phishing smishing attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes outdated. To deal with this issue, we present Agentic Knowledge Distillation, which consists of a powerful LLM acts as an autonomous teacher that fine-tun...
ThreatsDay Bulletin: RustFS Flaw, Iranian Ops, WebUI RCE, Cloud Leaks, and 12 More Stories
The internet never stays quiet. Every week, new hacks, scams, and security problems show up somewhere. This week's stories show how fast attackers change their tricks, how small mistakes turn into big risks, and how the same old tools keep finding new ways to break in. Read on to catch up before...
Needles in a Haystack: Using Forensic Network Science to Uncover Insider Trading
Although the automation and digitisation of anti-financial crime investigation has made significant progress in recent years, detecting insider trading remains a unique challenge, partly due to the limited availability of labelled data. To address this challenge, we propose using a data-driven...
A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities across Clinical Specialties
Medical Large Language Models LLMs are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU cluster...
Synthetic Data: AI'S New Weapon against Android Malware
The ever-increasing number of Android devices and the accelerated evolution of malware, reaching over 35 million samples by 2024, highlight the critical importance of effective detection methods. Attackers are now using Artificial Intelligence to create sophisticated malware variations that can...
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research
Generating thorough natural language explanations for threat detections remains an open problem in cybersecurity research, despite significant advances in automated malware detection systems. In this work, we present AutoMalDesc, an automated static analysis summarization framework that, followin...
MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
High-quality data scarcity hinders malware detection, limiting ML performance. We introduce MalDataGen, an open-source modular framework for generating high-fidelity synthetic tabular data using modular deep learning models e.g., WGAN-GP, VQ-VAE. Evaluated via dual validation TR-TS/TS-TR, seven...
LLM-Generated Samples for Android Malware Detection
Android malware continues to evolve through obfuscation and polymorphism, posing challenges for both signature-based defenses and machine learning models trained on limited and imbalanced datasets. Synthetic data has been proposed as a remedy for scarcity, yet the role of large language models LL...
An Ethically Grounded LLM-Based Approach to Insider Threat Synthesis and Detection
Insider threats are a growing organizational problem due to the complexity of identifying their technical and behavioral elements. A large research body is dedicated to the study of insider threats from technological, psychological, and educational perspectives. However, research in this domain h...
Generative AI for Critical Infrastructure in Smart Grids: a Unified Framework for Synthetic Data Generation and Anomaly Detection
In digital substations, security events pose significant challenges to the sustained operation of power systems. To mitigate these challenges, the implementation of robust defense strategies is critically important. A thorough process of anomaly identification and detection in information and...
A Review of Privacy Metrics for Privacy-Preserving Synthetic Data Generation
Privacy Preserving Synthetic Data Generation PP-SDG has emerged to produce synthetic datasets from personal data while maintaining privacy and utility. Differential privacy DP is the property of a PP-SDG mechanism that establishes how protected individuals are when sharing their sensitive data. I...
Taming Data Challenges in ML-Based Security Tasks: Lessons from Integrating Generative AI
Machine learning-based supervised classifiers are widely used for security tasks, and their improvement has been largely focused on algorithmic advancements. We argue that data challenges that negatively impact the performance of these classifiers have received limited attention. We address the...
Generative AI for Vulnerability Detection in 6G Wireless Networks: Advances, Case Study, and Future Directions
The rapid advancement of 6G wireless networks, IoT, and edge computing has significantly expanded the cyberattack surface, necessitating more intelligent and adaptive vulnerability detection mechanisms. Traditional security methods, while foundational, struggle with zero-day exploits, adversarial...
Private Evolution Converges
Private Evolution PE is a promising training-free method for differentially private DP synthetic data generation. While it achieves strong performance in some domains e.g., images and text, its behavior in others e.g., tabular data is less consistent. To date, the only theoretical analysis of the...
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
Safely aligning large language models LLMs often demands extensive human-labeled preference data, a process that's both costly and time-consuming. While synthetic data offers a promising alternative, current methods frequently rely on complex iterative prompting or auxiliary models. To address...
A Certified Unlearning Approach without Access to Source Data
With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the...
Synthetic Tabular Data: Methods, Attacks and Defenses
Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade, leveraging corresponding advances in machine learning an...
Privacy Amplification through Synthetic Data: Insights from Linear Regression
Synthetic data inherits the differential privacy guarantees of the model used to generate it. Additionally, synthetic data may benefit from privacy amplification when the generative model is kept hidden. While empirical studies suggest this phenomenon, a rigorous theoretical understanding is stil...