Lucene search
K

6 matches found

Packet Storm News
Packet Storm News
added 2026/04/20 12:0 a.m.2 views

ARES: Adaptive Red-Teaming and End-To-End Repair of Policy-Reward System

Reinforcement Learning from Human Feedback RLHF is central to aligning Large Language Models LLMs, yet it introduces a critical vulnerability: an imperfect Reward Model RM can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/05 12:0 a.m.2 views

Persistent Human Feedback, LLMs, and Static Analyzers for Secure Code Generation and Vulnerability Detection

Existing literature heavily relies on static analysis tools to evaluate LLMs for secure code generation and vulnerability detection. We reviewed 1,080 LLM-generated code samples, built a human-validated ground-truth, and compared the outputs of two widely used static security tools, CodeQL and...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/23 12:0 a.m.3 views

Adaptive Alert Prioritisation in Security Operations Centres Via Learning to Defer with Human Feedback

Alert prioritisation AP is crucial for security operations centres SOCs to manage the overwhelming volume of alerts and ensure timely detection and response to genuine threats, while minimising alert fatigue. Although predictive AI can process large alert volumes and identify known patterns, it...

7AI score
Exploits0
Schneier on Security
Schneier on Security
added 2024/09/11 11:3 a.m.6 views

Evaluating the Effectiveness of Reward Modeling of Generative AI Systems

New research evaluating the effectiveness of reward modeling during Reinforcement Learning from Human Feedback RLHF: "SEAL: Systematic Error Analysis for Value ALignment." The paper introduces quantitative metrics for evaluating the effectiveness of modeling and aligning human values: Abstract:...

7.2AI score
Exploits0
Schneier on Security
Schneier on Security
added 2023/09/08 11:5 a.m.27 views

LLMs and Tool Use

Last March, just two weeks after GPT-4 was released, researchers at Microsoft quietly announced a plan to compile millions of APIs--tools that can do everything from ordering a pizza to solving physics equations to controlling the TV in your living room--into a compendium that would be made...

6.6AI score
Exploits0
Malwarebytes
Malwarebytes
added 2022/08/10 5:0 p.m.12 views

Now it's BlenderBot's turn to make shocking, inappropriate, and untrue remarks

Last Friday, Meta unveiled its new BlenderBot 3 AI chatbot, a conversational AI prototype. The company said its chatbot is designed to learn by having natural conversations with people online. It also improves its skills via human feedback. Meta also asserts with confidence that the more the AI...

0.9AI score
Exploits0
Rows per page
Query Builder