Tag: data scraping

Mar 5, 2026

AI System Shows Ability to Reidentify Anonymous Online Accounts

Researchers from ETH Zurich, Anthropic and the Machine Learning Alignment and Theory Scholars program have built an automated AI system that can link pseudonymous online profiles to real identities. Using large language models to analyze writing style, posting patterns and other clues, the system correctly matched up to 68 percent of accounts with 90 percent precision, far outpacing traditional methods. The experiment cost only a few dollars per profile, highlighting a low‑cost barrier for large‑scale deanonymization. The study warns that online anonymity may be less secure than many assume, especially as AI capabilities continue to improve. Leggi di più

Oct 23, 2025

Reddit Sues Perplexity and Three Other Firms Over Unauthorized Data Scraping

Reddit has filed a lawsuit against AI startup Perplexity and three data‑scraping companies—SerApi, OxyLabs and AWMProxy—accusing them of extracting Reddit content from search results without a license. The complaint alleges that the defendants used the scraped material to power AI answer engines, violating Reddit’s licensing terms. Reddit, which has begun licensing its data to major tech firms, is seeking damages and an injunction to stop further unauthorized use. The case underscores the growing tension between online platforms and AI developers over the use of publicly available content for training models. Leggi di più

Oct 22, 2025

Reddit Sues Perplexity and Data Scrapers Over Alleged Illegal Content Harvesting

Reddit has filed a lawsuit against Perplexity and three data‑scraping service providers—SerpApi, Oxylabs and AWMProxy—accusing them of large‑scale, unlawful circumvention of the platform’s data protections. The complaint alleges that Perplexity, as a customer of at least one scraper, used stolen Reddit content to power its AI answer engine despite a cease‑and‑desist letter sent in May 2024. Reddit claims the defendants’ tactics amount to a data‑laundering operation that threatens the value of its user‑generated content, which the company has begun licensing to AI firms such as OpenAI and Google. Leggi di più