**News Outlets Accuse Perplexity of Plagiarism and Unethical Web Scraping**

Cover Image

The line between fair use and plagiarism, and between routine web scraping and unethical summarization, is becoming increasingly blurred in the age of generative AI. Perplexity AI, a startup that combines a search engine with a large language model, is at the center of a controversy surrounding its approach to gathering and using online content.

Perplexity's model generates detailed responses to user queries, rather than just providing links. Unlike other AI models, Perplexity doesn't train its own foundational AI models, instead using open or commercially available ones to gather information from the internet and translate it into answers.

However, a series of accusations in June suggests that Perplexity's approach may be unethical. Forbes accused the startup of plagiarizing one of its news articles in its beta Perplexity Pages feature. Wired also accused Perplexity of illicitly scraping its website, along with other sites.

Perplexity maintains that it has done nothing wrong, stating that it has honored publishers' requests to not scrape content and that it is operating within the bounds of fair use copyright laws. The company, backed by Nvidia and Jeff Bezos, is working to raise $250 million at a near-$3 billion valuation.

Surreptitiously Scraping Web Content

Getty Images

Wired reported that Perplexity has ignored the Robots Exclusion Protocol to surreptitiously scrape areas of websites that publishers do not want bots to access. The publication observed a machine tied to Perplexity doing this on its own news site, as well as across other publications under its parent company, Condé Nast.

Developer Robb Knight conducted a similar experiment and came to the same conclusion. Both Wired reporters and Knight tested their suspicions by asking Perplexity to summarize a series of URLs and then watching on the server side as an IP address associated with Perplexity visited those sites. Perplexity then "summarized" the text from those URLs — though in the case of one dummy website with limited content that Wired created for this purpose, it returned text from the page verbatim.

Plagiarism or Fair Use?

Perplexity Pages

Forbes accused Perplexity of plagiarizing its scoop about former Google CEO Eric Schmidt developing AI-powered combat drones. Wired also accused Perplexity of plagiarism, stating that the startup plagiarized the very article that called out Perplexity for surreptitiously scraping its web content.

Wired reporters said the Perplexity chatbot "produced a six-paragraph, 287-word text closely summarizing the conclusions of the story and the evidence used to reach them." The publication noted that Perplexity's IP might show up as a visitor to a website that is "otherwise kind of prohibited from robots.txt" only when a user puts a URL into their query, which "doesn't meet the definition of crawling."

Perplexity's head of business, Dmitry Shevelenko, argued that summarizing a URL isn't the same thing as crawling. "Crawling is when you're just going around sucking up information and adding it to your index," Shevelenko said. He noted that Perplexity's IP might show up as a visitor to a website that is "otherwise kind of prohibited from robots.txt" only when a user puts a URL into their query, which "doesn't meet the definition of crawling."

However, to Wired and many other publishers, that's a distinction without a difference because visiting a URL and pulling the information from it to summarize the text sure looks a whole lot like scraping if it's done thousands of times a day.