![](https://cdn.prod.website-files.com/6213ddd7bd3eb8fb93bf1da4/67a61c7f3a0e4e70333b1833_id-schema%20(3).png)
![Bar chart showing increase over time with Momentic logo](https://cdn.prod.website-files.com/6213ddd7bd3eb80dfdbf1d95/641afc78bf851ccb01dcb500_momentic-article-1200%C3%971200.png)
Oh hey! AI search is changing how people find our content. ChatGPT, Claude, Perplexity - these tools are a growing source of website traffic (Semrush just reported a 300% jump in domains getting ChatGPT traffic in second half of last year).
Most AI crawlers can access your content by default. But with how fast this space is moving, it's super helpful to know exactly which crawlers are out there and verify they can actually see your site. I've put together a complete list and found this rad tool called Knowatoa that makes checking access by user agent very simple.
Here are the major AI crawlers you should have on your radar:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
Mozilla/5.0 (compatible; anthropic-ai/1.0; +http://www.anthropic.com/bot.html)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Mozilla/5.0 (compatible; claude-web/1.0; +http://www.anthropic.com/bot.html)
Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html)
Applebot: Mozilla/5.0 (compatible; Applebot/1.0; +http://www.apple.com/bot.html
Applebot-Extended: Mozilla/5.0 (compatible; Applebot-Extended/1.0; +http://www.apple.com/bot.html)
Mozilla/5.0 (compatible; BingBot/1.0; +http://www.bing.com/bot.html)
FacebookBot: Mozilla/5.0 (compatible; FacebookBot/1.0; +http://www.facebook.com/bot.html
Meta External Fetcher: Mozilla/5.0 (compatible; meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler))
LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML\, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
Mozilla/5.0 (compatible; Bytespider/1.0; +http://www.bytedance.com/bot.html)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Mozilla/5.0 (compatible; YouBot (+http://www.you.com))
Mozilla/5.0 (compatible; DuckAssistBot/1.0; +http://www.duckduckgo.com/bot.html)
Mozilla/5.0 (compatible; AI2Bot/1.0; +http://www.allenai.org/crawler)
Mozilla/5.0 (compatible; CCBot/1.0; +http://www.commoncrawl.org/bot.html)
Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html)
Mozilla/5.0 (compatible; omgili/1.0; +http://www.omgili.com/bot.html)
Timpibot/0.8 (+http://www.timpi.io)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)
Here's some robots.txt configuration snippets that allow major AI crawlers while maintaining standard SEO best practices
1# Example robots.txt entries to allow specific AI Crawlers
2
3# Allen Institute (AI2Bot)
4User-agent: AI2Bot
5Allow: /
6
7# Amazon (Amazonbot)
8User-agent: Amazonbot
9Allow: /
10
11# Anthropic (Anthropic AI Bot)
12User-agent: anthropic-ai
13Allow: /
14
15# Anthropic (ClaudeBot)
16User-agent: ClaudeBot
17Allow: /
18
19# Anthropic (Claude Web)
20User-agent: claude-web
21Allow: /
22
23# Apple (Applebot)
24User-agent: Applebot
25Allow: /
26
27# Apple (Applebot-Extended)
28User-agent: Applebot-Extended
29Allow: /
30
31# Microsoft (BingBot)
32User-agent: BingBot
33Allow: /
34
35# ByteDance (Bytespider)
36User-agent: Bytespider
37Allow: /
38
39# Common Crawl (CCBot)
40User-agent: CCBot
41Allow: /
42
43# OpenAI (ChatGPT-User)
44User-agent: ChatGPT-User
45Allow: /
46
47# OpenAI (GPTBot)
48User-agent: GPTBot
49Allow: /
50
51# OpenAI (OAI-SearchBot)
52User-agent: OAI-SearchBot
53Allow: /
54
55# Cohere (cohere-ai)
56User-agent: cohere-ai
57Allow: /
58
59# Diffbot (DiffBot)
60User-agent: DiffBot
61Allow: /
62
63# DuckDuckGo (DuckAssistBot)
64User-agent: DuckAssistBot
65Allow: /
66
67# Meta (FacebookBot)
68User-agent: FacebookBot
69Allow: /
70
71# Meta (Meta External Fetcher)
72User-agent: meta-externalagent
73Allow: /
74
75# Google (Google-Extended)
76User-agent: Google-Extended
77Allow: /
78
79# LinkedIn (LinkedInBot)
80User-agent: LinkedInBot
81Allow: /
82
83# Omgili (omgili)
84User-agent: omgili
85Allow: /
86
87# Perplexity (PerplexityBot)
88User-agent: PerplexityBot
89Allow: /
90
91# Timpi (Timpibot)
92User-agent: Timpibot
93Allow: /
94
95# You.com (YouBot)
96User-agent: YouBot
97Allow: /
Even with the correct robots.txt configuration, your web server or firewall might still block AI crawlers. I recommend using Knowatoa's AI Search Console to streamline validate your setup - it'll check your site against 24 different AI user agents and flag any access issues.
Otherwise you can use Merkle's robots.txt tester to audit user agents one-by-one.
As AI search continues to mature, this list will keep growing. I'll update this post as new crawlers emerge. Drop me a comment if you spot any I've missed!
Big shoutout to Mike Buckbee and his fantastic tool Knowatoa, which helps me stay on top of these crawlers/user agents. The AI Search Console tool is particularly helpful for validating your site's accessibility to AI crawlers.
Originally published: February 7, 2025 | Last updated: February 7, 2025