3 matches found
A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset
We investigate the contents of web-scraped data for training AI systems, at sizes where human dataset curators and compilers no longer manually annotate every sample. Building off of prior privacy concerns in machine learning models, we ask: What are the legal privacy implications of web-scraped...
PYSEC-2025-7 Posts scraped data to IP address associated with other malware distribution attacks.
Published in 2021, the imblog package is a Python library that scrapes data from a blog page to an IP address associated with other malware distribution attacks...
Google Open Sources Magika: AI-Powered File Identification Tool
Google has announced that it's open-sourcing Magika, an artificial intelligence AI-powered tool to identify file types, to help defenders accurately detect binary and textual file types. "Magika outperforms conventional file identification methods providing an overall 30% accuracy boost and up to...