CVE Search Engine - Security Vulnerabilities and Exploits Search Tool

show all

5 matches found

Packet Storm News•added 2026/04/20 12:0 a.m.•23 views

ARES: Adaptive Red-Teaming and End-To-End Repair of Policy-Reward System

Reinforcement Learning from Human Feedback RLHF is central to aligning Large Language Models LLMs, yet it introduces a critical vulnerability: an imperfect Reward Model RM can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches...

5.8AI score

SaveExploits0

Packet Storm News•added 2026/02/10 12:0 a.m.•19 views

SecCodePRM: A Process Reward Model for Code Security

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level...

5.7AI score

SaveExploits0

Packet Storm News•added 2025/07/14 12:0 a.m.•7 views

PRM-Free Security Alignment of Large Models Via Red Teaming and Adversarial Training

Large Language Models LLMs have demonstrated remarkable capabilities across diverse applications, yet they pose significant security risks that threaten their safe deployment in critical domains. Current security alignment methodologies predominantly rely on Process Reward Models PRMs to evaluate...

7.1AI score

SaveExploits0

Packet Storm News•added 2025/06/06 12:0 a.m.•7 views

Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Existing safety assurance research has primarily focused on training-phase alignment to instill safe behaviors into LLMs. However, recent studies have exposed these methods' susceptibility to diverse jailbreak attacks. Concurrently, inference scaling has significantly advanced LLM reasoning...

7AI score

SaveExploits0

Packet Storm News•added 2025/06/03 12:0 a.m.•7 views

BadReward: Clean-Label Poisoning of Reward Models in Text-To-Image RLHF

Reinforcement Learning from Human Feedback RLHF is crucial for aligning text-to-image T2I models with human preferences. However, RLHF's feedback mechanism also opens new pathways for adversaries. This paper demonstrates the feasibility of hijacking T2I models by poisoning a small fraction of...

6.9AI score

SaveExploits0

Rows per page

Query Builder

Family

Bulletin Type

Min CVSS Score

Date

Order by