5 matches found
Mitigating Distribution Shift in Graph-Based Android Malware Classification Via Function Metadata and LLM Embeddings
Graph-based malware classifiers can achieve over 94% accuracy on standard Android datasets, yet we find they suffer accuracy drops of up to 45% when evaluated on previously unseen malware variants from the same family - a scenario where strong generalization would typically be expected. This...
Membership Inference Attacks for Unseen Classes
Shadow model attacks are the state-of-the-art approach for membership inference attacks on machine learning models. However, these attacks typically assume an adversary has access to a background nonmember data distribution that matches the distribution the target model was trained on. We initiat...
Aurora: Are Android Malware Classifiers Reliable under Distribution Shift?
The performance figures of modern drift-adaptive malware classifiers appear promising, but does this translate to genuine operational reliability? The standard evaluation paradigm primarily focuses on baseline performance metrics, neglecting confidence-error alignment and operational stability...
JailbreaksOverTime: Detecting Jailbreak Attacks under Distribution Shift
Safety and security remain critical concerns in AI deployment. Despite safety training through reinforcement learning with human feedback RLHF 32, language models remain vulnerable to jailbreak attacks that bypass safety guardrails. Universal jailbreaks - prefixes that can circumvent alignment fo...
Monitor and Recover: a Paradigm for Future Research on Distribution Shift in Learning-Enabled Cyber-Physical Systems
With the known vulnerability of neural networks to distribution shift, maintaining reliability in learning-enabled cyber-physical systems poses a salient challenge. In response, many existing methods adopt a detect and abstain methodology, aiming to detect distribution shift at inference time so...