RAS-Eval: a Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments
The rapid deployment of Large language model LLM agents in critical domains like healthcare and finance necessitates robust security frameworks. To address the absence of standardized evaluation benchmarks for these agents in dynamic environments, we introduce RAS-Eval, a comprehensive security...