ReGA: Representation-Guided Abstraction for Model-Based Safeguarding of LLMs
Large Language Models LLMs have achieved significant success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks in generating harmful content and vulnerability to jailbreaking attacks. To analyze and monitor machine learning models,...