August 11, 2025
5 min read
Vishal Mathur
Cloudflare-Perplexity Dispute Highlights Challenges of AI Agents on the Web
In recent days, a battle has been brewing that may realign the contours of web standards with artificial intelligence (AI), the idea of an open web, and how data is collected by AI companies. Internet infrastructure giant Cloudflare fired the first shots, alleging that Perplexity uses stealth to access and collect data from websites that specifically prefer not to. The AI company counters with a philosophical argument â questioning if, with the rise of AI-powered assistants and user-driven agents, the boundary between what counts as âjust a botâ and what serves the immediate needs of real people has become increasingly blurred.Cloudflareâs Allegations
Cloudflare CEO Matthew Prince describes AIâs existential threat to publishers. The company claims Perplexity accesses websites in ways that evade site ownersâ preferences, specifically ignoring the directives in a websiteâsrobots.txt
file, which instructs web crawlers on which parts of a site they can access.
âWe are observing stealth crawling behaviour from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the websiteâs preferences,â Cloudflare detailed in a technical post.Cloudflare also shared that customers who disallowed Perplexity crawling in their
robots.txt
files reported that Perplexity was still able to access their content despite being blocked.
Their tests replicated this obfuscation behavior, contrasting it with OpenAIâs ChatGPT crawler, which respects robots.txt
directives and stops crawling when disallowed.
Perplexityâs Response: AI Assistants Are Different
Perplexityâs reply does not directly address the obfuscation claims but instead highlights a fundamental difference between AI assistants and traditional web crawlers. They describe their AI tools as âuser-drivenâ agents that fetch information on demand rather than building massive databases through systematic crawling.âWhen you ask Perplexity a question that requires current informationâsay, âWhat are the latest reviews for that new restaurant?ââthe AI doesnât already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question. This is fundamentally different from traditional web crawling, in which crawlers systematically visit millions of pages to build massive databases, whether anyone asked for that specific information or not.âPerplexity insists that it does not store or train on the fetched information and that their occasional use of a third-party cloud browser service (BrowserBase) is minimal and unrelated to the bulk of their traffic.
The Broader Implications
This dispute is more than a technical disagreement; it signals a deeper conversation about how AI changes the web and search. AI chatbots are increasingly becoming default search tools, replacing traditional search engines. Google itself is layering AI features into its search results. Cloudflare notes that since July, over 2.5 million sites have blocked AI training access and promotes âpay per crawlâ models to compensate creators. Publishers may want more control, consent, or payment for automated AI access to their content.Conclusion
The Cloudflare-Perplexity controversy raises fundamental questions about the future of the web. Can the collaborative, trust-based model that has governed the internet survive the aggressive data collection needs of AI systems? How will web standards evolve to address AI agents that blur the lines between bots and human-driven assistants? While immediate changes may not be evident, Cloudflareâs spotlight onrobots.txt
has sparked an important conversation about AI, data access, and publisher rights.
Originally published at Hindustan Times on Aug 11, 2025.