1314 matches found
Sparse Autoencoders Are Capable LLM Jailbreak Mitigators
Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering CC-Delta, an SAE-based defense that identifies jailbreak-relevant sparse features by comparing token-level representations of the same harmful request with and without...
Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models
Jailbreaking large language models LLMs has emerged as a critical security challenge with the widespread deployment of conversational AI systems. Adversarial users exploit these models through carefully crafted prompts to elicit restricted or unsafe outputs, a phenomenon commonly referred to as...
Kill It with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks
Machine learning models are increasingly present in our everyday lives; as a result, they become targets of adversarial attackers seeking to manipulate the systems we interact with. A well-known vulnerability is a backdoor introduced into a neural network by poisoned training data or a malicious...
Discord will limit profiles to teen-appropriate mode until you verify your age
Discord announced it will put all existing and new profiles in teen-appropriate mode by default in early March. The teen-appropriate profile mode will remain in place until users prove they are adults. To change a profile to “full access” will require verification by Discord’s age inference model...
SecCodePRM: A Process Reward Model for Code Security
Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level...
Next-Generation Cyberattack Detection with Large Language Models: Anomaly Analysis across Heterogeneous Logs
This project explores large language models LLMs for anomaly detection across heterogeneous log sources. Traditional intrusion detection systems suffer from high false positive rates, semantic blindness, and data scarcity, as logs are inherently sensitive, making clean datasets rare. We address...
ACORN-IDS: Adaptive Continual Novelty Detection for Intrusion Detection Systems
Intrusion Detection Systems IDS must maintain reliable detection performance under rapidly evolving benign traffic patterns and the continual emergence of cyberattacks, including zero-day threats with no labeled data available. However, most machine learning-based IDS approaches either assume...
Inference-Time Backdoors Via Hidden Instructions in LLM Chat Templates
Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is backdoor attacks, in which adversaries embed hidden behaviors in language models that activate under specific conditions. Previous work has assumed th...
NVIDIA Triton Inference Server EVBufferToJson Uncaught Exception Denial-of-Service Vulnerability
This vulnerability allows remote attackers to create a denial-of-service condition on affected installations of NVIDIA Triton Inference Server. Authentication is not required to exploit this vulnerability. The specific flaw exists within the EVBufferToJson method. The issue results from the lack ...
GHSA-J7X9-7J54-2V3H Hugging Face Text Generation Inference vulnerable to Uncontrolled Resource Consumption
A vulnerability in huggingface/text-generation-inference version 3.3.6 allows unauthenticated remote attackers to exploit unbounded external image fetching during input validation in VLM mode. The issue arises when the router scans inputs for Markdown image links and performs a blocking HTTP GET...
Hugging Face Text Generation Inference vulnerable to Uncontrolled Resource Consumption
A vulnerability in huggingface/text-generation-inference version 3.3.6 allows unauthenticated remote attackers to exploit unbounded external image fetching during input validation in VLM mode. The issue arises when the router scans inputs for Markdown image links and performs a blocking HTTP GET...
CVE-2026-0599
A vulnerability in huggingface/text-generation-inference version 3.3.6 allows unauthenticated remote attackers to exploit unbounded external image fetching during input validation in VLM mode. The issue arises when the router scans inputs for Markdown image links and performs a blocking HTTP GET...
CVE-2026-0599
CVE-2026-0599 concerns huggingface/text-generation-inference version 3.3.6, where unauthenticated attackers can trigger a resource-exhaustion DoS via unbounded external image fetching during input validation in VLM mode. The router scans inputs for Markdown image links and issues a blocking HTTP ...
EUVD-2026-5137
A vulnerability in huggingface/text-generation-inference version 3.3.6 allows unauthenticated remote attackers to exploit unbounded external image fetching during input validation in VLM mode. The issue arises when the router scans inputs for Markdown image links and performs a blocking HTTP GET...
CVE-2026-0599 Unbounded External Image Fetch in Validation Leads to Resource-Exhaustion DoS in huggingface/text-generation-inference
A vulnerability in huggingface/text-generation-inference version 3.3.6 allows unauthenticated remote attackers to exploit unbounded external image fetching during input validation in VLM mode. The issue arises when the router scans inputs for Markdown image links and performs a blocking HTTP GET...
The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers
Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language models. Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning dat...
Text Generation Inference 资源管理错误漏洞
Text Generation Inference is a Rust, Python, and gRPC server developed by Hugging Face for text generation inference. Version 3.3.6 of Text Generation Inference contains a resource management vulnerability. This vulnerability stems from the unlimited acquisition of external images during input...
PT-2026-5654
Name of the Vulnerable Software and Affected Versions huggingface/text-generation-inference version 3.3.6 huggingface/text-generation-inference versions prior to 3.3.7 Description A flaw exists in huggingface/text-generation-inference that allows unauthenticated remote attackers to cause a...
Evaluating Large Language Models for Security Bug Report Prediction
Early detection of security bug reports SBRs is critical for timely vulnerability mitigation. We present an evaluation of prompt-based engineering and fine-tuning approaches for predicting SBRs using Large Language Models LLMs. Our findings reveal a distinct trade-off between the two approaches...
Timing Attack
OctoPrint is vulnerable to Timing Attack. The vulnerability is due to character-by-character API key comparison with early termination, which allows a network-based attacker to infer valid API keys by measuring response times and guessing the key one character at a time...