3 matches found
Secure Reinforcement Learning: On Model-Free Detection of Man in the Middle Attacks
We consider the problem of learning-based man-in-the-middle MITM attacks in cyber-physical systems CPS, and extend our previously proposed Bellman Deviation Detection BDD framework for model-free reinforcement learning RL. We refine the standard MDP attack model by allowing the reward function to...
PRM-Free Security Alignment of Large Models Via Red Teaming and Adversarial Training
Large Language Models LLMs have demonstrated remarkable capabilities across diverse applications, yet they pose significant security risks that threaten their safe deployment in critical domains. Current security alignment methodologies predominantly rely on Process Reward Models PRMs to evaluate...
MorphMark: Flexible Adaptive Watermarking for Large Language Models
Watermarking by altering token sampling probabilities based on red-green list is a promising method for tracing the origin of text generated by large language models LLMs. However, existing watermark methods often struggle with a fundamental dilemma: improving watermark effectiveness the...