3 matches found
Discovering Universal Activation Directions for PII Leakage in Language Models
Modern language models exhibit rich internal structure, yet little is known about how privacy-sensitive behaviors, such as personally identifiable information PII leakage, are represented and modulated within their hidden states. We present UniLeak, a mechanistic-interpretability framework that...
(0Day) OpenAI ChatGPT Improper Input Validation Model Policy Bypass Vulnerability
This vulnerability allows remote attackers to bypass policy restictions on affected versions of OpenAI ChatGPT. Authentication is required to exploit this vulnerability. The specific flaw exists within the interface to the ChatGPT-Vision Data model. The issue results from the lack of proper...
A week in security (November 06 – November 12)
Last week on Malwarebytes Labs: Defeating Little Brother requires a new outlook on privacy: Lock and Code S04E23 Medical research data Advarra stolen after SIM swap Okta breach happened after employee logged into personal Google account Introducing ThreatDown: A new chapter for Malwarebytes...