4 matches found
Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling
Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...
Red-Teaming Text-To-Image Systems by Rule-Based Preference Modeling
Text-to-image T2I models raise ethical and safety concerns due to their potential to generate inappropriate or harmful images. Evaluating these models' security through red-teaming is vital, yet white-box approaches are limited by their need for internal access, complicating their use with...
GitGot - Semi-automated, Feedback-Driven Tool To Rapidly Search Through Troves Of Public Data On GitHub For Sensitive Secrets
GitGot is a semi-automated, feedback-driven tool to empower users to rapidly search through troves of public data on GitHub for sensitive secrets. How it Works During search sessions, users will provide feedback to GitGot about search results to ignore, and GitGot prunes the set of results. Users...
General Purpose Fuzzing: Honggfuzz
Honggfuzz is a general-purpose fuzzing tool. Given a starting corpus of test files, Hongfuzz supplies and modifies input to a test program and utilize the ptrace API / POSIX signal interface to detect and log crashes. Features Easy setup : No complicated configuration files or setup necessary —...