647 matches found
Towards Effective Complementary Security Analysis Using Large Language Models
A key challenge in security analysis is the manual evaluation of potential security weaknesses generated by static application security testing SAST tools. Numerous false positives FPs in these reports reduce the effectiveness of security analysis. We propose using Large Language Models LLMs to...
Tech-ASan: Two-Stage Check for Address Sanitizer
Address Sanitizer ASan is a sharp weapon for detecting memory safety violations, including temporal and spatial errors hidden in C/C++ programs during execution. However, ASan incurs significant runtime overhead, which limits its efficiency in testing large software. The overhead mainly comes fro...
[SECURITY] Fedora 42 Update: golang-x-perf-0-0.28.20250326git02a15fd.fc42
This package holds the source for various tools related to performance measurement, storage, and analysis. - cmd/benchstat contains a command-line tool that computes and 7 compares statistics about benchmarks. - cmd/benchsave contains a command-line tool for publishing benchmark results. - storag...
[SECURITY] Fedora 41 Update: golang-x-perf-0-0.28.20250326git02a15fd.fc41
This package holds the source for various tools related to performance measurement, storage, and analysis. - cmd/benchstat contains a command-line tool that computes and 7 compares statistics about benchmarks. - cmd/benchsave contains a command-line tool for publishing benchmark results. - storag...
MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark
Recent advances in generative models have led to their application in password guessing, with the aim of replicating the complexity, structure, and patterns of human-created passwords. Despite their potential, inconsistencies and inadequate evaluation methodologies in prior research have hindered...
Uncovering Reliable Indicators: Improving IoC Extraction from Threat Reports
Indicators of Compromise IoCs are critical for threat detection and response, marking malicious activity across networks and systems. Yet, the effectiveness of automated IoC extraction systems is fundamentally limited by one key issue: the lack of high-quality ground truth. Current extraction too...
LLM Unlearning Should Be Form-Independent
Large Language Model LLM unlearning aims to erase or suppress undesirable knowledge within the model, offering promise for controlling harmful or private information to prevent misuse. However, recent studies highlight its limited efficacy in real-world scenarios, hindering practical adoption. In...
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
The widespread adoption of Large Language Models LLMs has heightened concerns about their security, particularly their vulnerability to jailbreak attacks that leverage crafted prompts to generate malicious outputs. While prior research has been conducted on general security capabilities of LLMs,...
SoK: Data Reconstruction Attacks against Machine Learning Models: Definition, Metrics, and Benchmark
Data reconstruction attacks, which aim to recover the training dataset of a target model with limited access, have gained increasing attention in recent years. However, there is currently no consensus on a formal definition of data reconstruction attacks or appropriate evaluation metrics for...
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
Computer-Use Agents CUAs with full system access enable powerful task automation but pose significant security and privacy risks due to their ability to manipulate files, access user data, and execute arbitrary commands. While prior work has focused on browser-based agents and HTML-level attacks,...
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models Via Non-Textual Modalities
Existing attacks against multimodal language models MLLMs primarily communicate instructions through text accompanied by adversarial images. In contrast, we exploit the capabilities of MLLMs to interpret non-textual instructions, specifically, adversarial images or audio generated by our novel...
Data Flows in You: Benchmarking and Improving Static Data-Flow Analysis on Binary Executables
Data-flow analysis is a critical component of security research. Theoretically, accurate data-flow analysis in binary executables is an undecidable problem, due to complexities of binary code. Practically, many binary analysis engines offer some data-flow analysis capability, but we lack...
A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?
We introduce the Robust Audio Watermarking Benchmark RAW-Bench, a benchmark for evaluating deep learning-based audio watermarking methods with standardized and systematic comparisons. To simulate real-world usage, we introduce a comprehensive audio attack pipeline with various distortions such as...
VideoMarkBench: Benchmarking Robustness of Video Watermarking
The rapid development of video generative models has led to a surge in highly realistic synthetic videos, raising ethical concerns related to disinformation and copyright infringement. Recently, video watermarking has been proposed as a mitigation strategy by embedding invisible marks into...
PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Large language models LLMs have achieved remarkable capabilities but remain vulnerable to adversarial prompts known as jailbreaks, which can bypass safety alignment and elicit harmful outputs. Despite growing efforts in LLM safety research, existing evaluations are often fragmented, focused on...
Capability-Based Scaling Laws for LLM Red-Teaming
As large language models grow in capability and agency, identifying vulnerabilities through red-teaming becomes vital for safe deployment. However, traditional prompt-engineering approaches may prove ineffective once red-teaming turns into a weak-to-strong problem, where target models surpass...
Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair
The rapid advancement of bug-finding techniques has led to the discovery of more vulnerabilities than developers can reasonably fix, creating an urgent need for effective Automated Program Repair APR methods. However, the complexity of modern bugs often makes precise root cause analysis difficult...
LAMDA: a Longitudinal Android Malware Benchmark for Concept Drift Analysis
Machine learning ML-based malware detection systems often fail to account for the dynamic nature of real-world training and test data distributions. In practice, these distributions evolve due to frequent changes in the Android ecosystem, adversarial development of new malware families, and the...
Benchmarking Poisoning Attacks against Retrieval-Augmented Generation
Retrieval-Augmented Generation RAG has proven effective in mitigating hallucinations in large language models by incorporating external knowledge during inference. However, this integration introduces new security vulnerabilities, particularly to poisoning attacks. Although prior work has explore...
CVE-2024-27508
Atheme 7.2.12 contains a memory leak vulnerability in /atheme/src/crypto-benchmark/main.c...