685 matches found
Scalable Defense against In-The-Wild Jailbreaking Attacks with Safety Context Retrieval
Large Language Models LLMs are known to be vulnerable to jailbreaking attacks, wherein adversaries exploit carefully engineered prompts to induce harmful or unethical responses. Such threats have raised critical concerns about the safety and reliability of LLMs in real-world deployment. While...
FragFake: a Dataset for Fine-Grained Detection of Edited Images with Vision Language Models
Fine-grained edited image detection of localized edits in images is crucial for assessing content authenticity, especially given that modern diffusion models and image editing methods can produce highly realistic manipulations. However, this domain faces three challenges: 1 Binary classifiers yie...
Leveraging Large Language Models for Command Injection Vulnerability Analysis in Python: an Empirical Study on Popular Open-Source Projects
Command injection vulnerabilities are a significant security threat in dynamic languages like Python, particularly in widely used open-source projects where security issues can have extensive impact. With the proven effectiveness of Large Language ModelsLLMs in code-related tasks, such as testing...
Is Your Prompt Safe? Investigating Prompt Injection Attacks against Open-Source LLMs
Whitepaper called Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs...
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
The growing adoption of large language models LLMs has led to a new paradigm in mobile computing--LLM-powered mobile AI agents--capable of decomposing and automating complex tasks directly on smartphones. However, the security implications of these agents remain largely unexplored. In this paper,...
Cross-Cloud Data Privacy Protection: Optimizing Collaborative Mechanisms of AI Systems by Integrating Federated Learning and LLMs
In the age of cloud computing, data privacy protection has become a major challenge, especially when sharing sensitive data across cloud environments. However, how to optimize collaboration across cloud environments remains an unresolved problem. In this paper, we combine federated learning with...
One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems
Large Language Models LLMs enhanced with Retrieval-Augmented Generation RAG have shown improved performance in generating accurate responses. However, the dependence on external knowledge bases introduces potential security vulnerabilities, particularly when these knowledge bases are publicly...
MorphMark: Flexible Adaptive Watermarking for Large Language Models
Watermarking by altering token sampling probabilities based on red-green list is a promising method for tracing the origin of text generated by large language models LLMs. However, existing watermark methods often struggle with a fundamental dilemma: improving watermark effectiveness the...
Fragments to Facts: Partial-Information Fragment Inference from LLMs
Large language models LLMs can leak sensitive training data through memorization and membership inference attacks. Prior work has primarily focused on strong adversarial assumptions, including attacker access to entire samples or long, ordered prefixes, leaving open the question of how vulnerable...
On Membership Inference Attacks in Knowledge Distillation
Nowadays, Large Language Models LLMs are trained on huge datasets, some including sensitive information. This poses a serious privacy concern because privacy attacks such as Membership Inference Attacks MIAs may detect this sensitive information. While knowledge distillation compresses LLMs into...
Benchmarking LLMs in an Embodied Environment for Blue Team Threat Hunting
As cyber threats continue to grow in scale and sophistication, blue team defenders increasingly require advanced tools to proactively detect and mitigate risks. Large Language Models LLMs offer promising capabilities for enhancing threat analysis. However, their effectiveness in real-world blue...
ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks
The integration of large language models LLMs into a wide range of applications has highlighted the critical role of well-crafted system prompts, which require extensive testing and domain expertise. These prompts enhance task performance but may also encode sensitive information and filtering...
MPMA: Preference Manipulation Attack against Model Context Protocol
Model Context Protocol MCP standardizes interface mapping for large language models LLMs to access external data and tools, which revolutionizes the paradigm of tool selection and facilitates the rapid expansion of the LLM agent tool ecosystem. However, as the MCP is increasingly adopted,...
Diverging Towards Hallucination: Detection of Failures in Vision-Language Models Via Multi-Token Aggregation
Vision-language models VLMs now rival human performance on many multimodal tasks, yet they still hallucinate objects or generate unsafe text. Current hallucination detectors, e.g., single-token linear probing SLP and PTrue, typically analyze only the logit of the first generated token or just its...
SafeTrans: LLM-Assisted Transpilation from C to Rust
Rust is a strong contender for a memory-safe alternative to C as a "systems" programming language, but porting the vast amount of existing C code to Rust is a daunting task. In this paper, we evaluate the potential of large language models LLMs to automate the transpilation of C code to idiomatic...
On Technique Identification and Threat-Actor Attribution Using LLMs and Embedding Models
Attribution of cyber-attacks remains a complex but critical challenge for cyber defenders. Currently, manual extraction of behavioral indicators from dense forensic documentation causes significant attribution delays, especially following major incidents at the international scale. This research...
Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy
Large Language Models LLMs have gained significant popularity due to their remarkable capabilities in text understanding and generation. However, despite their widespread deployment in inference services such as ChatGPT, concerns about the potential leakage of sensitive user data have arisen...
S3C2 Summit 2024-09: Industry Secure Software Supply Chain Summit
While providing economic and software development value, software supply chains are only as strong as their weakest link. Over the past several years, there has been an exponential increase in cyberattacks, specifically targeting vulnerable links in critical software supply chains. These attacks...
Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
As Large Language Models LLMs are widely used, understanding them systematically is key to improving their safety and realizing their full potential. Although many models are aligned using techniques such as reinforcement learning from human feedback RLHF, they are still vulnerable to jailbreakin...
Red Teaming the Mind of the Machine: a Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Large Language Models LLMs are increasingly integrated into consumer and enterprise applications. Despite their capabilities, they remain susceptible to adversarial attacks such as prompt injection and jailbreaks that override alignment safeguards. This paper provides a systematic investigation o...