10 matches found
Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety
Current approaches to LLM adversarial testing suffer from coverage gaps: manual red-teaming does not scale, LLM-as-attacker methods exhibit mode collapse, and gradient-based approaches produce uninterpretable gibberish. We introduce a quality-diversity evolutionary framework that operates at the...
Hacker Used Claude Code, GPT-4.1 to Exfiltrate Hundreds of Millions of Mexican Records
A lone hacker used Claude Code and GPT-4.1 to exfiltrate hundreds of millions of Mexican citizen records from 9 government agencies...
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection across Attack Surfaces and Model Safety Tiers
We present a stage-decomposed analysis of prompt injection attacks against five frontier LLM agents. Prior work measures task-level attack success rate ASR; we localize the pipeline stage at which each model's defense activates. We instrument every run with a cryptographic canary token...
Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models
Phishing emails continue to pose a persistent challenge to online communication, exploiting human trust and evading automated filters through realistic language and adaptive tactics. While large language models LLMs such as GPT-4 and LLaMA-3-8B achieve strong accuracy in text classification, thei...
CVE-2025-50709
An issue in Perplexity AI GPT-4 allows a remote attacker to obtain sensitive information via a GET parameter...
PT-2025-38157
Name of the Vulnerable Software and Affected Versions: Perplexity AI GPT-4 affected versions not specified Description: An issue in Perplexity AI GPT-4 allows a remote attacker to obtain sensitive information via a GET parameter. Recommendations: At the moment, there is no information about a new...
Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks
The widespread deployment of large language models LLMs has raised critical concerns over their vulnerability to jailbreak attacks, i.e., adversarial prompts that bypass alignment mechanisms and elicit harmful or policy-violating outputs. While proprietary models like GPT-4 have undergone extensi...
LLM vs. SAST: a Technical Analysis on Detecting Coding Bugs of GPT4-Advanced Data Analysis
With the rapid advancements in Natural Language Processing NLP, large language models LLMs like GPT-4 have gained significant traction in diverse applications, including security vulnerability scanning. This paper investigates the efficacy of GPT-4 in identifying software vulnerabilities compared...
ACSE-Eval: Can LLMs Threat Model Real-World Cloud Infrastructure?
While Large Language Models have shown promise in cybersecurity applications, their effectiveness in identifying security threats within cloud deployments remains unexplored. This paper introduces AWS Cloud Security Engineering Eval, a novel dataset for evaluating LLMs cloud security threat...
Enhancing Security and Strengthening Defenses in Automated Short-Answer Grading Systems
This study examines vulnerabilities in transformer-based automated short-answer grading systems used in medical education, with a focus on how these systems can be manipulated through adversarial gaming strategies. Our research identifies three main types of gaming strategies that exploit the...