CVE Search Engine - Security Vulnerabilities and Exploits Search Tool

show all

1 matches found

Packet Storm News•added 2025/06/09 12:0 a.m.•4 views

IF-GUIDE: Influence Function-Guided Detoxification of LLMs

We study how training data contributes to the emergence of toxic behaviors in large-language models. Most prior work on reducing model toxicity adopts $reactive$ approaches, such as fine-tuning pre-trained and potentially toxic models to align them with human values. In contrast, we propose a...

7.1AI score

Exploits0

Rows per page

Query Builder

Family

Bulletin Type

Min CVSS Score

Date

Order by