2 matches found
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
Large language models remain vulnerable to jailbreak attacks -- inputs designed to bypass safety mechanisms and elicit harmful responses -- despite advances in alignment and instruction tuning. We propose Head-Masked Nullspace Steering HMNS, a circuit-level intervention that i identifies attentio...
Information Disclosure
Apache Hive Query Language is vulnerable to information disclosure. The vulnerability is possible because it does not enforce the policy to restrict users from creating views on tables with column masking rules defined...