3 matches found
Securing Large Language Models (LLMs) from Prompt Injection Attacks
Large Language Models LLMs are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-following ability to make it perform malicious tasks. Recent work has proposed JATMO, a task-specif...
Enhancing Watermarking Quality for LLMs Via Contextual Generation States Awareness
Recent advancements in watermarking techniques have enabled the embedding of secret messages into AI-generated text AIGT, serving as an important mechanism for AIGT detection. Existing methods typically interfere with the generation processes of large language models LLMs to embed signals within...
Robust LLM Fingerprinting Via Domain-Specific Watermarks
As open-source language models OSMs grow more capable and are widely shared and finetuned, ensuring model provenance, i.e., identifying the origin of a given model instance, has become an increasingly important issue. At the same time, existing backdoor-based model fingerprinting techniques often...