685 matches found
Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise
The deployment of large language models LLMs on third-party devices requires new ways to protect model intellectual property. While Trusted Execution Environments TEEs offer a promising solution, their performance limits can lead to a critical compromise: using a precomputed, static secret basis ...
A one-prompt attack that breaks LLM safety alignment
Large language models LLMs and diffusion models now power a wide range of applications, from document assistance to text-to-image generation, and users increasingly expect these systems to be safety-aligned by default. Yet safety alignment is only as robust as its weakest failure mode. Despite...
AI chat app leak exposes 300 million messages tied to 25 million users
An independent security researcher uncovered a major data breach affecting Chat & Ask AI, one of the most popular AI chat apps on Google Play and Apple App Store, with more than 50 million users. The researcher claims to have accessed 300 million messages from over 25 million users due to an...
RECUR: Resource Exhaustion Attack Via Recursive-Entropy Guided Counterfactual Utilization and Reflection
Large Reasoning Models LRMs employ reasoning to address complex tasks. Such explicit reasoning requires extended context lengths, resulting in substantially higher resource consumption. Prior work has shown that adversarially crafted inputs can trigger redundant reasoning processes, exposing LRMs...
ShallowJail: Steering Jailbreaks against Large Language Models
Large Language ModelsLLMs have been successful in numerous fields. Alignment has usually been applied to prevent them from harmful purposes. However, aligned LLMs remain vulnerable to jailbreak attacks that deliberately mislead them into producing harmful outputs. Existing jailbreaks are either...
Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-Based Phishing Detection
Phishing sites continue to grow in volume and sophistication. Recent work leverages large language models LLMs to analyze URLs, HTML, and rendered content to decide whether a website is a phishing site. While these approaches are promising, LLMs are inherently vulnerable to prompt injection PI...
Detecting backdoored language models at scale
Today, we are releasing new research on detecting backdoors in open-weight language models. Our research highlights several key properties of language model backdoors, laying the groundwork for a practical scanner designed to detect backdoored models at scale and improve overall trust in AI...
The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers
Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language models. Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning dat...
Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents
Large language models LLMs have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability discovery and exploitation due to limited interaction, weak execution grounding, and a lack of experience reuse. We propose Co-RedTeam, a security-aware...
Toxic_Flow_Analysis_Framework_For_Agentic_AI
Toxic Flow Analysis TFA Framework A Secure-by-Design framew...
Iran-Linked RedKitten Cyber Campaign Targets Human Rights NGOs and Activists
A Farsi-speaking threat actor aligned with Iranian state interests is suspected to be behind a new campaign targeting non-governmental organizations and individuals involved in documenting recent human rights abuses. The activity, observed by HarfangLab in January 2026, has been codenamed...
Semantic-Aware Advanced Persistent Threat Detection Using Autoencoders on LLM-Encoded System Logs
Advanced Persistent Threats APTs are among the most challenging cyberattacks to detect. They are carried out by highly skilled attackers who carefully study their targets and operate in a stealthy, long-term manner. Because APTs exhibit "low-and-slow" behavior, traditional statistical methods and...
AEGIS: White-Box Attack Path Generation Using LLMs and Training Effectiveness Evaluation for Large-Scale Cyber Defence Exercises
Creating attack paths for cyber defence exercises requires substantial expert effort. Existing automation requires vulnerability graphs or exploit sets curated in advance, limiting where it can be applied. We present AEGIS, a system that generates attack paths using LLMs, white-box access, and...
Now You Hear Me: Audio Narrative Attacks against Large Audio-Language Models
Large audio-language models increasingly operate on raw speech inputs, enabling more seamless integration across domains such as voice assistants, education, and clinical triage. This transition, however, introduces a distinct class of vulnerabilities that remain largely uncharacterized. We exami...
No More, No Less: Least-Privilege Language Models
Least privilege is a core security principle: grant each request only the minimum access needed to achieve its goal. Deployed language models almost never follow it, instead being exposed through a single API endpoint that serves all users and requests. This gap exists not because least privilege...
Evaluating Large Language Models for Security Bug Report Prediction
Early detection of security bug reports SBRs is critical for timely vulnerability mitigation. We present an evaluation of prompt-based engineering and fine-tuning approaches for predicting SBRs using Large Language Models LLMs. Our findings reveal a distinct trade-off between the two approaches...
The Semantic Trap: Do Fine-Tuned LLMs Learn Vulnerability Root Cause or Just Functional Pattern?
LLMs demonstrate promising performance in software vulnerability detection after fine-tuning. However, it remains unclear whether these gains reflect a genuine understanding of vulnerability root causes or merely an exploitation of functional patterns. In this paper, we identify a critical failur...
Researchers Find 175,000 Publicly Exposed Ollama AI Servers Across 130 Countries
A new joint investigation by SentinelOne SentinelLABS, and Censys has revealed that the open-source artificial intelligence AI deployment has created a vast "unmanaged, publicly accessible layer of AI compute infrastructure" that spans 175,000 unique Ollama hosts across 130 countries. These...
A Systematic Literature Review on LLM Defenses against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy
The rapid advancement and widespread adoption of generative artificial intelligence GenAI and large language models LLMs has been accompanied by the emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs...
User-Centric Phishing Detection: A RAG and LLM-Based Approach
The escalating sophistication of phishing emails necessitates a shift beyond traditional rule-based and conventional machine-learning-based detectors. Although large language models LLMs offer strong natural language understanding, using them as standalone classifiers often yields elevated...