WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks
Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performancee.g., WebArena or safety against malicious actionse.g., SafeArena, no existing framework assesses an agent's ability to...