lang icon En
June 28, 2024, 4:26 p.m.
1793

None

Brief news summary

Amazon Web Services (AWS) is conducting an investigation into Perplexity AI for potentially violating its rules, according to Wired. The AWS cloud division is looking into allegations that Perplexity AI's crawler, hosted on AWS servers, is not adhering to the Robots Exclusion Protocol. This protocol allows developers to specify if bots can access certain web pages. Wired previously reported that a virtual machine hosted on AWS servers visited various publishers' websites, bypassing their robots.txt instructions. To confirm this, Wired entered article headlines into Perplexity AI's chatbot and received closely paraphrased results. Other AI companies may also be bypassing robots.txt files to gather content. AWS stated that customers are responsible for complying with its terms of service and that it investigates reports of abuse. Perplexity denied violating the Robots Exclusion Protocol but admitted that its chatbot ignores robots.txt when given specific URLs.

Amazon Web Services is currently conducting an investigation to determine if Perplexity AI is in violation of its rules, as reported by Wired. Specifically, the cloud division of the company is looking into allegations that Perplexity AI is utilizing a crawler hosted on its servers that disregards the Robots Exclusion Protocol. This protocol, a web standard, involves developers placing a robots. txt file on a domain to instruct bots on whether they can or cannot access a particular page. While adherence to these instructions is voluntary, reputable companies have generally respected them since their implementation in the 1990s. In a previous article, Wired discovered a virtual machine hosted on an Amazon Web Services server with the IP address 44. 221. 181. 252, which was bypassing the robots. txt instructions on its website. This machine is said to have visited various Condé Nast properties multiple times over the last three months to scrape their content. Other publications such as The Guardian, Forbes, and The New York Times also reported multiple visits from the same machine. Wired conducted an experiment where they inputted headlines or brief descriptions of their articles into Perplexity's chatbot to verify if the company was scraping their content.

The chatbot's responses closely paraphrased the articles "with minimal attribution. " A recent Reuters report suggests that Perplexity is not the only AI company bypassing robots. txt files to gather content for training large language models. However, Wired only provided Amazon with information regarding Perplexity AI's crawler. Amazon Web Services stated, "AWS’s terms of service prohibit abusive and illegal activities, and our customers are responsible for complying with those terms. " They further mentioned that they regularly receive reports of alleged abuse and investigate them accordingly. Perplexity's spokesperson, Sara Platnick, responded to Amazon's inquiries, asserting that their crawlers abide by the Robots Exclusion Protocol and do not violate AWS Terms of Service. Platnick also mentioned that Amazon's scrutiny of Wired's media inquiry follows their standard protocol for investigating potential resource abuse reports. However, Platnick admitted to Wired that PerplexityBot will disregard robots. txt when users include a specific URL in their chatbot inquiry. Aravind Srinivas, the CEO of Perplexity, previously denied claims that his company disregarded the Robots Exclusion Protocol and then lied about it. Srinivas did admit that Perplexity utilizes third-party web crawlers in addition to its own, where the bot identified by Wired is one of them. Update, June 28, 2024, 2:20 PM ET: This post has been updated to include Perplexity's statement to Engadget. Update, June 28, 2024, 8:27 PM ET: This post has been updated to include a statement from Amazon Web Services.


Watch video about

None

Try our premium solution and start getting clients — at no cost to you

Content creator image

I'm your Content Creator.
Let’s make a post or video and publish it on any social media — ready?

Language

Hot news

March 11, 2026, 2:31 p.m.

Nvidia Developing 'NemoClaw' AI Agent to Compete …

Nvidia is developing a new AI agent called NemoClaw, designed to compete with existing platforms like OpenClaw and other similar AI tools.

March 11, 2026, 2:24 p.m.

Social media algorithm: 2025 guide for all major …

There are no quick shortcuts to instantly boost your content on social media algorithms, but legitimate strategies exist to maximize organic reach while adhering to community guidelines.

March 11, 2026, 2:18 p.m.

OpenAI Develops AI Jobs Platform to Compete with …

OpenAI is making notable progress in transforming the employment landscape through two major initiatives that leverage artificial intelligence to connect job seekers with employers while enhancing AI skills within the workforce.

March 11, 2026, 2:16 p.m.

The New SEO: From Rankings To Recommendations In …

The rapidly evolving field of artificial intelligence is transforming search technologies, prompting businesses to rethink content strategies.

March 11, 2026, 2:15 p.m.

Microsoft Touts AI Sales at Town Hall, Reveals Ba…

Microsoft Corporation recently highlighted major progress in the adoption of its artificial intelligence (AI) tools among corporate clients during a companywide town hall meeting.

March 11, 2026, 2:15 p.m.

Recall.ai: Building the infrastructure behind AI …

Imagine onboarding a new employee solely through written materials—emails, documents—without any conversation.

March 11, 2026, 10:24 a.m.

How SMM Panels are Changing Social Media Marketin…

Digital Marketing How SMM Panels Are Transforming Social Media Marketing and Growth in 2026 By Simran Mishra | Reviewed by Manisha Sharma Overview: SMM panels enhance early engagement on social media, boosting post visibility and enabling content to reach larger audiences faster

All news

AI Company

Launch your AI-powered team to automate Marketing, Sales & Growth

AI Company welcome image

and get clients on autopilot — from social media and search engines. No ads needed

Begin getting your first leads today