AI

Cloudflare launches a tool to combat AI bots

Comment

grey robot head on red background
Image Credits: Getty Images

Cloudflare, the publicly traded cloud service provider, has launched a new, free tool to prevent bots from scraping websites hosted on its platform for data to train AI models.

Some AI vendors, including Google, OpenAI and Apple, allow website owners to block the bots they use for data scraping and model training by amending their site’s robots.txt, the text file that tells bots which pages they can access on a website. But, as Cloudflare points out in a post announcing its bot-combating tool, not all AI scrapers respect this.

“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” the company writes on its official blog. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”

So, in an attempt to address the problem, Cloudflare analyzed AI bot and crawler traffic to fine-tune automatic bot detection models. The models consider, among other factors, whether an AI bot might be trying to evade detection by mimicking the appearance and behavior of someone using a web browser.

“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint,” Cloudflare writes. “Based on these signals, our models [are] able to appropriately flag traffic from evasive AI bots as bots.”

Cloudflare has set up a form for hosts to report suspected AI bots and crawlers and says that it’ll continue to manually blacklist AI bots over time.

The problem of AI bots has come into sharp relief as the generative AI boom fuels the demand for model training data.

Many sites, wary of AI vendors training models on their content without alerting or compensating them, have opted to block AI scrapers and crawlers. Around 26% of the top 1,000 sites on the web have blocked OpenAI’s bot, according to one study; another found that more than 600 news publishers had blocked the bot.

Blocking isn’t a surefire protection, however. As alluded to earlier, some vendors appear to be ignoring standard bot exclusion rules to gain a competitive advantage in the AI race. AI search engine Perplexity was recently accused of impersonating legitimate visitors to scrape content from websites, and OpenAI and Anthropic are said to have at times ignored robots.txt rules.

In a letter to publishers last month, content licensing startup TollBit said that, in fact, it sees “many AI agents” ignoring the robots.txt standard.

Tools like Cloudflare’s could help — but only if they prove to be accurate in detecting clandestine AI bots. And they won’t solve the more intractable problem of publishers risking sacrificing referral traffic from AI tools like Google’s AI Overviews, which exclude sites from inclusion if they block specific AI crawlers.

More TechCrunch

Meta announced former President Donald Trump’s Facebook and Instagram accounts will no longer be subject to heightened suspension penalties, according to an updated blog post on Friday. The company says…

Meta removes special restrictions for Trump’s account ahead of 2024 elections

A Castro Valley resident was charged Thursday for allegedly slashing the tires of 17 Waymo robotaxis in San Francisco between June 24 and June 26, according to the city’s district…

Waymo cameras capture footage of person charged in alleged robotaxi tire slashings

Welcome to Startups Weekly — your weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. This…

Defending Russia’s EU neighbors

Cat-Wells said she started this platform because traditional hiring processes are exclusionary and often overlook skilled, talented disabled people.

A VC told Keely Cat-Wells to get a male, non-disabled co-founder — she balked, nabbed a $2M pre-seed round

A new study examines whether AI could be an automated helpmeet in creative tasks, with mixed results: It appeared to help less naturally creative people write more original short stories…

Experiment finds AI boosts creativity individually — but lowers it collectively

Featured Article

HeadSpin, whose founder is in prison for fraud, sold to PE firm in fire sale, sources say

In total, HeadSpin raised $117 million since its 2015 inception and was last valued at $1.1 billion in 2020.

HeadSpin, whose founder is in prison for fraud, sold to PE firm in fire sale, sources say

A bipartisan group of senators has introduced a new bill that seeks to protect artists, songwriters and journalists from having their content used to train AI models or generate AI…

New Senate bill seeks to protect artists’ and journalists’ content from AI use

When Keith Rabois announced he was leaving Founders Fund to return to Khosla Ventures in January, it came as a shock to many in the venture capital ecosystem — and…

From Ethan Choi to Spencer Peterson, venture capitalists continue to play musical chairs

Archer Aviation and Southwest Airlines are teaming up to figure out what it will take to build out a network of electric air taxis at California airports. Southwest’s customer data…

Archer’s vision of an air taxi network could benefit from Southwest customer data

If you visited the Wikipedia website on mobile this week, you might have seen a pop-up indicating that dark mode is ready for prime time.

Wikipedia’s mobile website finally gets a dark mode — here’s how to turn it on

Featured Article

What the AT&T phone records data breach means for you

The giant U.S. telco lost the information of around 110 million customers. Here’s what you need to know.

What the AT&T phone records data breach means for you

The error brings to a close SpaceX’s incredible streak of 335 flawless launches across the company’s Falcon family of rockets, which also includes the more powerful Falcon Heavy.

SpaceX Falcon 9 suffers rare failure on orbit during Starlink deployment

The AI chatbot has been trained on Amazon’s product catalog, customer reviews, community Q&As, and other public information found around the web.

Amazon AI chatbot Rufus is now live for all US customers

If X continues to violate Europe’s data protection rules, the company is on the hook for fines of up to €4,000 per day.

More bad news for Elon Musk after X user’s legal challenge to shadowban prevails

HERO Software has closed a €40 million Series B financing round, and plans to expand across Europe. 

A startup set out to fight climate change — it did it by helping plumbers

Fusion power may still be a few years away, but one startup is laying the groundwork for what it hopes will become a bustling sector of the economy.

Fusion pioneer Commonwealth Fusion Systems is selling core magnet tech to the University of Wisconsin

For months, rumors persisted that Google, and perhaps others, were interested in buying HubSpot, a Boston-based CRM and marketing software company. HubSpot’s market cap ballooned as the rumors persisted, eventually…

Boston VCs are pleased that HubSpot will remain an independent company

ByteDance’s video editing app CapCut will stop offering free cloud storage to host creative assets starting August 5. In the past few days, users have received notifications about CapCut changing…

CapCut will stop offering free cloud storage from August 5

The platform formerly known as Twitter has earned the dubious honor of being the first very large online platform (VLOP) to face a preliminary finding of breaching the European Union’s…

Europe confirms first clutch of DSA grievances on Elon Musk’s X

Featured Article

AT&T says criminals stole phone records of ‘nearly all’ customers in new data breach

The stolen data includes 110 million AT&T customer phone numbers, calling and text records, and some location-related data.

AT&T says criminals stole phone records of ‘nearly all’ customers in new data breach

The full and final text of the EU AI Act, the European Union’s landmark risk-based regulation for applications of artificial intelligence, has been published in the bloc’s Official Journal. In…

EU’s AI Act gets published in bloc’s Official Journal, starting clock on legal deadlines

Featured Article

SoftBank acquires UK AI chipmaker Graphcore

While the figure of $500 million has been bandied around in various reports for months, in a press briefing early Thursday morning, Graphcore co-founder and CEO Nigel Toon remained coy on the details.

SoftBank acquires UK AI chipmaker Graphcore

Elon Musk’s X, formerly Twitter, is continuing to develop a downvoting feature that will be used to improve how replies are ranked. Although the company has not yet officially announced…

X is building a ‘dislike’ button for downvoting replies

Featured Article

Data breach exposes millions of mSpy spyware customers

A huge batch of mSpy customer service emails dating back to 2014 were stolen in a May data breach.

Data breach exposes millions of mSpy spyware customers

Kudos founder says her company makes a disposable diaper lined with 100% cotton, unlike the major competitors.

Shark Tank-backed Kudos raises another $3M for healthier, cotton-based disposable diapers

Astra CEO Chris Kemp is already pulling out of a parking spot when he warns the person in the passenger seat that he doesn’t have a valid driver’s license. “And…

‘Wild Wild Space’ doc captures the risks and rivalries of the new space race

Although these companies’ claims are artfully couched, it’s clear that they want to express that the model sees in some sense of the word.

‘Visual’ AI models might not see anything at all

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Did you…

Lucid revs up sales, Fisker makes a deal and Uber reignites an old fight

Retro CEO Nathan Sharp isn’t worrying just yet about Google’s plan to copy his app’s experience, despite the numerous similarities.

Photo-sharing startup Retro spots Google Photos copying its idea and design

Tesla had internally planned to build the dedicated robotaxi and the $25,000 car, often referred to as the Model 2, on the same platform.

Tesla reportedly delays ‘robotaxi’ event to October