8 matches found
Self-Supervised Learning for Android Malware Detection on a Time-Stamped Dataset
Android malware detectors built with machine learning often suffer from temporal bias: models are trained and evaluated without respecting apps' actual release times, inflating accuracy and weakening real-world robustness. We address this by constructing a time-stamped dataset of benign and...
Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study
Third-party skills extend LLM agents with powerful capabilities but often handle sensitive credentials in privileged environments, making leakage risks poorly understood. We present the first large-scale empirical study of this problem, analyzing 17,022 skills sampled from 170,226 on SkillsMP usi...
Assessing Spear-Phishing Website Generation in Large Language Model Coding Agents
Large Language Models are expanding beyond being a tool humans use and into independent agents that can observe an environment, reason about solutions to problems, make changes that impact those environments, and understand how their actions impacted their environment. One of the most common...
Privacy Practices of Browser Agents
This paper presents a systematic evaluation of the privacy behaviors and attributes of eight recent, popular browser agents. Browser agents are software that automate Web browsing using large language models and ancillary tooling. However, the automated capabilities that make browser agents...
Exploring Hidden Geographic Disparities in Android Apps
While mobile app evolution has been widely studied, geographical variation in app behavior remains largely unexplored. This paper presents a large-scale study of location-based Android app differentiation, uncovering two important and underexamined phenomena with security and fairness implication...
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research
Generating thorough natural language explanations for threat detections remains an open problem in cybersecurity research, despite significant advances in automated malware detection systems. In this work, we present AutoMalDesc, an automated static analysis summarization framework that, followin...
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
AI agents powered by large language models LLMs are being deployed at scale, yet we lack a systematic understanding of how the choice of backbone LLM affects agent security. The non-deterministic sequential nature of AI agents complicates security modeling, while the integration of traditional...
LLMail-Inject: a Dataset from a Realistic Adaptive Prompt Injection Challenge
Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models LLMs to distinguish between instructions and data in their inputs. Despite numerous defense proposals, the systematic evaluation against adaptive adversaries remains limited, even when successful attacks ca...