List of Top AI Search Crawlers + User Agents (April 2025)

Est. Read Time:

3 minutes

You're reading:

AI user-agents, bots, and crawlers to watch (April 2025 update)

Key takeaway

AI crawlers identify themselves through user-agent strings. Keeping those strings current in your robots.txt lets you guide how language models interact with your work. Most LLM-based AI search engines crawlers rely on a user-agent string; a short bit of text that tells your server “who” is making the request. When you spot GPTBot, ClaudeBot, PerplexityBot, or any of the newer strings below in your server access logs, you know an AI model is indexing, scraping, or quoting your page. Keep your robots.txt file and firewall rules up-to-date so the right agents can read your content while the wrong ones stay out.

What changed since the March 2025 edition?

<div class="tbl-wrap"><table style="border-radius:12px;"><thead><tr><th>New string</th><th>Vendor / purpose</th><th>Why it matters</th></tr></thead><tbody><tr><td><code>MistralAI-User/1.0</code></td><td>Mistral — fetches citations for Le Chat.</td><td>First seen Mar 2025 and respects robots.txt.</td></tr><tr><td><code>Perplexity-User/1.0</code></td><td>Perplexity — live fetch when a person clicks a link.</td><td>Ignores robots.txt because it counts as a human-triggered visit.</td></tr><tr><td><code>ChatGPT-User/2.0</code></td><td>OpenAI — successor to v1.0 for on-demand page loads.</td><td>Rolling out since Feb 2025; keep rules for 1.x too.</td></tr></tbody></table></div>

New strings appear regularly; consider scheduling a quarterly review with your IT or engineering team.

Quick definitions

‍AI crawler: A bot that copies public web pages so a large-language model can learn from them.‍
AI user-agent: The string that identifies that crawler in HTTP requests. You use it in robots.txt rules.‍
Robots.txt: A plain-text file at the root of your site that tells crawlers what they may fetch. Add one line per User-agent you want to allow or block.

Why you should care

Server logs show AI search bots now account for a growing share of referral visits. Understanding which agents they use helps you encourage that traffic responsibly.

AI search bots (ChatGPT, Claude, Bing Copilot, and Perplexity) send measurable referral traffic to websites.
Clear robots.txt rules let helpful agents in and keep abusive scrapers out.
If you have access to server log files, you can see how often AI/LLM bots are hitting your website so you can create a baseline.

Momentic research shows significant growth in referrals to websites from ChatGPT. This is over double the rate at which Google Search sent users to non-Google properties in March 2025.

Chart showing ChatGPT sends 1.4 visits per unique visitor to external domains. Google sends 0.6. — ChatGPT sends 1.4 visits per unique visitor to external domains. Google sends 0.6.

Most AI crawlers can access your content by default. But with how fast this space is moving, it's super helpful to know exactly which crawlers are out there and verify they can actually see your site.

Complete AI crawler list

I merged every token from my February post with the April 2025 additions. Copy it as you see fit. I use the same order in my firewall allow‑list.

<div class="tbl-wrap"><table style="border-radius:12px;"><thead><tr><th>User-agent token</th><th>Vendor</th><th>Helpful, keyword-optimized description</th><th>robots.txt snippet</th></tr></thead><tbody><tr><td><code>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot)</code></td><td>OpenAI</td><td>Large-scale <strong>gptbot crawler</strong> that gathers public text for ChatGPT & GPT-4o; block if you want to keep data out of their AI models.</td><td><code>User-agent: GPTBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)</code></td><td>OpenAI</td><td>Retrieval-augmented <em>openai searchbot</em> that indexes pages for ChatGPT’s research and live references.</td><td><code>User-agent: OAI-SearchBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot)</code></td><td>OpenAI</td><td>Legacy ChatGPT user fetcher that loads URLs on demand when a user shares links or ChatGPT needs additional data.</td><td><code>User-agent: ChatGPT-User Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/2.0; +https://openai.com/bot)</code></td><td>OpenAI</td><td>Up-to-date ChatGPT user agent replacing version 1.0 for on-demand fetching and in-response lookups.</td><td><code>User-agent: ChatGPT-User/2.0 Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; anthropic-ai/1.0; +http://www.anthropic.com/bot.html)</code></td><td>Anthropic</td><td>Primary Anthropic crawler that collects broad web data for Claude model development.</td><td><code>User-agent: anthropic-ai Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; +claudebot@anthropic.com)</code></td><td>Anthropic</td><td>Real-time fetcher used by Claude to retrieve cited URLs during AI chat sessions.</td><td><code>User-agent: ClaudeBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; claude-web/1.0; +http://www.anthropic.com/bot.html)</code></td><td>Anthropic</td><td>Targeted crawler for recent web content, feeding the <em>Claude browser agent</em> with updated site data.</td><td><code>User-agent: claude-web Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)</code></td><td>Perplexity</td><td>Primary <strong>PerplexityBot crawler</strong> that indexes sites to build the Perplexity AI search engine.</td><td><code>User-agent: PerplexityBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://www.perplexity.ai/useragent)</code></td><td>Perplexity</td><td>Loads a page only when a user clicks a Perplexity citation; treated as human-like traffic, bypasses robots.txt.</td><td><code>User-agent: Perplexity-User Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html)</code></td><td>Google</td><td>Used for Gemini and AI-related indexing beyond standard search; block to exclude your content from Google’s AI responses.</td><td><code>User-agent: Google-Extended Disallow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; BingBot/1.0; +http://www.bing.com/bot.html)</code></td><td>Microsoft</td><td>Default <em>bing bot</em> crawling service that powers Bing Search and Bing Chat (Copilot) answers.</td><td><code>User-agent: BingBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)</code></td><td>Amazon</td><td>Amazonbot that crawls sites to support Alexa’s queries, Fire OS AI, and product recommendations.</td><td><code>User-agent: Amazonbot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; Applebot/1.0; +http://www.apple.com/bot.html)</code></td><td>Apple</td><td>Primary crawler for Siri and Spotlight, indexing webpages to improve Apple’s search features.</td><td><code>User-agent: Applebot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; Applebot-Extended/1.0; +http://www.apple.com/bot.html)</code></td><td>Apple</td><td>Opt-in extended crawler collecting data for Apple’s future AI models (blocked by default unless allowed).</td><td><code>User-agent: Applebot-Extended Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; FacebookBot/1.0; +http://www.facebook.com/bot.html)</code></td><td>Meta</td><td>Official Meta crawler that generates preview snippets and link metadata for Facebook and Instagram.</td><td><code>User-agent: FacebookBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler))</code></td><td>Meta</td><td>Fallback fetcher used if FacebookBot fails, ensuring external content can still render on Meta platforms.</td><td><code>User-agent: meta-externalagent Allow: /</code></td></tr><tr><td><code>LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)</code></td><td>LinkedIn</td><td>LinkedInBot that extracts Open Graph data for previews in posts, messages, and articles.</td><td><code>User-agent: LinkedInBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; Bytespider/1.0; +http://www.bytedance.com/bot.html)</code></td><td>ByteDance</td><td>Feeds TikTok search, CapCut AI captions, and Toutiao headlines by crawling global webpages.</td><td><code>User-agent: Bytespider Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; DuckAssistBot/1.0; +http://www.duckduckgo.com/bot.html)</code></td><td>DuckDuckGo</td><td>Scrapes fact-based snippets for DuckAssist, DuckDuckGo’s private AI answer feature.</td><td><code>User-agent: DuckAssistBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html)</code></td><td>Cohere</td><td>Collects textual data for Cohere’s language models, helping refine large-scale text generation.</td><td><code>User-agent: cohere-ai Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; AI2Bot/1.0; +http://www.allenai.org/crawler)</code></td><td>Allen Institute</td><td>Academic crawler powering Semantic Scholar and other Allen Institute AI research tools.</td><td><code>User-agent: AI2Bot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; CCBot/1.0; +http://www.commoncrawl.org/bot.html)</code></td><td>Common Crawl</td><td>Indexes public websites for Common Crawl’s open dataset, used by many open-source AI projects.</td><td><code>User-agent: CCBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)</code></td><td>Diffbot</td><td>Transforms webpages into structured data like product listings, articles, or FAQs for ML pipelines.</td><td><code>User-agent: Diffbot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; omgili/1.0; +http://www.omgili.com/bot.html)</code></td><td>Omgili</td><td>Focuses on indexing forums, comments, and discussion boards for deeper conversation insights.</td><td><code>User-agent: omgili Allow: /</code></td></tr><tr><td><code>Timpibot/0.8 (+http://www.timpi.io)</code></td><td>Timpi</td><td>Decentralized search startup crawler with lower traffic, aiming to build a distributed indexing network.</td><td><code>User-agent: TimpiBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; YouBot (+http://www.you.com))</code></td><td>You.com</td><td>Crawler behind You.com’s AI search and browser assistant, indexing content for real-time answers.</td><td><code>User-agent: YouBot Allow: /</code></td></tr><tr><td><code>Mozilla/5.0 (compatible; MistralAI-User/1.0; +https://mistral.ai/bot)</code></td><td>Mistral</td><td>Mistral’s real-time citation fetcher for “Le Chat” assistant; respects robots.txt directives.</td><td><code>User-agent: MistralAI-User Allow: /</code></td></tr></tbody></table></div>

Robots.txt examples that include every agent above

The examples below illustrate two common approaches—open access for discovery or selective blocking for privacy. Choose the blend that aligns with your content strategy and business requirements.

1# You can paste any of these blocks into robots.txt or a firewall rule. grouped by company to make things readable for y'all.
2
3# ——— OPENAI ———
4# Search (shows my webpages as links inside ChatGPT search). NOT used for model training.
5User-agent: OAI-SearchBot
6Allow: /
7
8# User-driven browsing from ChatGPT and Custom GPTs. Acts after a human click.
9User-agent: ChatGPT-User
10User-agent: ChatGPT-User/2.0
11Allow: /
12
13# Model-training crawler. Opt-out here if I don’t want content in GPT-4o or GPT-5.
14User-agent: GPTBot
15Disallow: /private/          # example private folder
16Allow: /                     # everything else
17
18# ——— ANTHROPIC (Claude) ———
19User-agent: anthropic-ai      # bulk model training
20Allow: /
21User-agent: ClaudeBot         # chat citation fetch
22User-agent: claude-web        # web-focused crawl
23Allow: /
24
25# ——— PERPLEXITY ———
26User-agent: PerplexityBot     # index builder
27Allow: /
28User-agent: Perplexity-User   # human-triggered visit
29Allow: /
30
31# ——— GOOGLE (Gemini) ———
32User-agent: Google-Extended
33Allow: /
34
35# ——— MICROSOFT (Bing / Copilot) ———
36User-agent: BingBot
37Allow: /
38
39# ——— AMAZON ———
40User-agent: Amazonbot
41Allow: /
42
43# ——— APPLE ———
44User-agent: Applebot
45User-agent: Applebot-Extended
46Allow: /
47
48# ——— META ———
49User-agent: FacebookBot
50User-agent: meta-externalagent
51Allow: /
52
53# ——— LINKEDIN ———
54User-agent: LinkedInBot
55Allow: /
56
57# ——— BYTEDANCE ———
58User-agent: Bytespider
59Allow: /
60
61# ——— DUCKDUCKGO ———
62User-agent: DuckAssistBot
63Allow: /
64
65# ——— COHERE ———
66User-agent: cohere-ai
67Allow: /
68
69# ——— ALLEN INSTITUTE / COMMON CRAWL / OTHER RESEARCH ———
70User-agent: AI2Bot
71User-agent: CCBot
72User-agent: Diffbot
73User-agent: omgili
74Allow: /
75
76# ——— EMERGING SEARCH START-UPS ———
77User-agent: TimpiBot
78User-agent: YouBot
79Allow: /

‍

‍Why I care: this grouping mirrors the agents’ purposes (e.g. search, user-action, or model-training) so I can throttle the buckets that matter to my organization.

Robots.txt best‑practice checklist

List every AI agent you care about. Use the table above or search your logs for tokens such as gptbot, bingbot, claudebot, perplexitybot, google-extended, amazonbot, and duckassistbot.
Add a directive after every User-agent: – at least one Allow or Disallow line. A lone name won't do anything.
Use blank lines between blocks to avoid merge errors.
Re‑test after big LLM releases. New versions will sometimes ignore older rules or perhaps have entirely new User Agent names.

Testing tips & code snippets

Why test? I test to confirm every AI user agent above can (or cannot) reach the website as I intend.

Field Notes from Jori Ford's, “Hybrid Engine Optimization” (SEO Week NYC 2025)

“Organic search runs on presence, logs, and explicit directives. Mine the logs, tag the bots, and give every crawler a yes-or-no.”

Quick-deploy checklist

<div class="tbl-wrap"><table style="border-radius:12px;"><thead><tr><th>What Jori dropped</th><th>How to use it today</th></tr></thead><tbody><tr><td><strong>Presence beats position</strong></td><td>Surface tight, citation-ready copy in the first 1 kB of HTML and inside every heading agents read first.</td></tr><tr><td><strong>Logs over guesswork</strong></td><td>Pull 30 days of access logs → group by <code>user_agent</code> → tune robots.txt and firewall for the short list of real AI crawlers.</td></tr><tr><td><strong>Every <code>User-agent:</code> needs a rule</strong></td><td>Pair each crawler token with an explicit <code>Allow</code> or <code>Disallow</code> to lock intent for bots and devs.</td></tr><tr><td><strong>GTM 3-step bot tracker</strong></td><td>Variable: JS returns <code>navigator.userAgent</code> → Trigger: regex <code>GPTBot|PerplexityBot|ClaudeBot…</code> → Tag: send to GA4/BigQuery.</td></tr><tr><td><strong>Quarterly bot audit</strong></td><td>Re-run the log extract every 90 days and adjust the allow-list only for crawlers delivering measurable referral sessions.</td></tr></tbody></table></div>

Why this matters

Log files surface the crawlers that send traffic; rank-tracking dashboards don't show that, neither do standard analytics implementations.
A single GTM recipe can track every new agent...WITHOUT dev tickets.
Clean directives trim crawl budget, speed deploys, and keep security happy. :)

Oh, and this nugget from Jori:

There's 226 crawlers that Cloudflare found out. They found them out because they did IP mining and reverse lookups. Because here's the beautiful new world of AI that you're not going to be used to. Ai lies. Ai doesn't have any standards. They just want to get what they need. So guess what? You have 226 unknown bots traversing your planet, and you don't know who they are unless you have their IP address. Guess where you can get that? Your log files.

Jori Ford's, “Hybrid Engine Optimization” (SEO Week NYC 2025) — Thirty minutes on stage; six months of dev wrangling removed. **Jori rules.**

‍

Web based “all bots” check (UI)

<div class="tbl-wrap"><table style="border-radius:12px;"><thead><tr><th>Tool</th><th>What it checks</th><th>How I use it</th></tr></thead><tbody><tr><td><a href="https://knowatoa.com">Knowatoa AI Search Console</a></td><td>Hits 24 AI user-agents—GPTBot, ClaudeBot, PerplexityBot, etc.—against my robots.txt and server.</td><td>Enter URL → Run Audit → fix any red ✖ rows.</td></tr><tr><td><a href="https://technicalseo.com/tools/robots-txt/">Merkle robots.txt Tester</a></td><td>Single-agent check for edge cases like meta-externalagent.</td><td>Paste the agent string → confirm “Allowed”.</td></tr></tbody></table></div>

How to spot check for AI crawlers in your server logs

Run this shell one‑liner on Nginx or Apache logs:

grep -Ei "gptbot|oai-searchbot|chatgpt-user|claudebot|perplexitybot|google-extended|bingbot" access.log \| awk '{print $1,$4,$7,$12}' | head

‍

You will see hits like:

203.0.113.10 - - [25/Apr/2025:08:14:22 -0600] "GET /blog/ HTTP/1.1" 200 15843 "-" "GPTBot/1.1"

‍

‍Why it matters: surfaces hits from every major AI crawler bot in seconds.‍

Next: if a bot I expect is absent, I re-run Knowatoa or Merkle manual test to see if I blocked it. If I didn't, that bot didn't visit my website.

Firewall template snippets

Tip: Use firewall rules sparingly; start with robots.txt and escalate only if abuse appears in logs.

Cloudflare → Rules → Firewall Rules → Create rule

<div class="tbl-wrap"><table style="border-radius:12px;"><thead><tr><th>Purpose</th><th>Expression</th><th>Action</th></tr></thead><tbody><tr><td>Block GPT model-training only</td><td>(http.user_agent contains "GPTBot")</td><td>Block</td></tr><tr><td>Allow ChatGPT user traffic</td><td>(http.user_agent contains "ChatGPT-User")</td><td>Allow</td></tr></tbody></table></div>

Nginx

# Block Perplexity indexer but keep user clicks
if ($http_user_agent ~* "PerplexityBot") {
    return 403;
}

‍

Robots.txt boilerplate you can swap in

1# Block model-training crawlers
2User-agent: GPTBot
3Disallow: /
4
5User-agent: anthropic-ai
6Disallow: /
7
8User-agent: Google-Extended
9Disallow: /
10
11# Allow AI search crawlers
12User-agent: OAI-SearchBot
13User-agent: PerplexityBot
14Allow: /
15
16# Allow user-triggered agents
17User-agent: ChatGPT-User
18User-agent: ChatGPT-User/2.0
19User-agent: Perplexity-User
20Allow: /
21
22# Default catch-all
23User-agent: *
24Disallow: /

‍

A User-agent: line on its own just names the crawler.

Every block must also include at least one directive—Allow or Disallow—so the bot knows what it may fetch.

Two common patterns in robots.txt directives

Let the bot crawl everything

User-agent: GPTBot Allow: /

Restrict the bot everywhere

User-agent: PerplexityBot Disallow: /

You can group several user-agents that share the same rule:

User-agent: ChatGPT-User User-agent: ChatGPT-User/2.0 Allow: /

Or give each crawler its own custom path:

User-agent: ClaudeBot Allow: /public/ Disallow: /private/

Add a blank line between blocks to keep the file readable.

Emerging “agentic” browser fetchers

<div class="tbl-wrap"><table><thead><tr><th>Agent</th><th>Current ID</th><th>Status</th></tr></thead><tbody><tr><td>OpenAI Operator</td><td>No known user agent</td><td>Public beta. Acts like Chrome.</td></tr><tr><td>Google Project Mariner</td><td>None (mimics Chrome)</td><td>Trusted-tester phase. Watch for <code>mariner</code> token.</td></tr><tr><td>Anthropic Computer Use</td><td>None</td><td>Headless browser driven by Claude 3.5.</td></tr><tr><td>xAI Grok crawler</td><td>None yet</td><td>Docs promised this quarter.</td></tr></tbody></table></div>

Until these projects publish stable strings, pin access by IP ranges or lock them behind Cloudflare rules. Early-stage projects—monitor only; no action recommended yet.

For OpenAI's public list of IP ranges, see this regularly-updated JSON file that lists IP ranges OpenAI.

FAQs

What is an AI crawler in robots.txt?

Any bot that requests your pages for model training or instant answers. You tell it what to do with User-agent: lines.

**Is `User-agent: *` enough?**

No. A wildcard line should be a catch‑all. Still list named AI crawlers you care about; some ignore the star.

What is the best AI web crawler for open data?

‍Common Crawl (CCBot) is still the leader because it releases monthly snapshots anyone can download.

What do you mean by "top user agents"?

The tokens in this guide account for 95 % of AI crawler traffic according to log data we have access to.

Are bots required to follow directives in robots.txt files?

Nope! Most do though. Anthropic was critcized in 2024 for ignoring robots.txt rdirectives

Think of a robots.txt file as a list of preferences or suggestions on how to access a website. Block bad actors at the firewall/server level or add password authentication to content you don't want bots to access.

How do AI crawler bots fit into the picture of my target audience?

I'll be honest, I spent a few hours trying to put together a diagram in Canva, but this text-based version is better and easier to understand.

GPTBot ──► Your Website 
            │
            ▼
        LLM Training
            │
            ▼
      Human Prompt
            │
            ▼
        Agent UI (e.g. ChatGPT)
        ╱  │   ╲
       ╱   │    ╲
Search-GPT │  Operator (unknown user agent)
       │   │    │
       │   ▼    ▼
       │ Response   ChatGPT-User
       │        ╲     │
       │         ▼    │
       └────────►(merge)
                   ▼
             Human Click
                   │
                   ▼
             Your Website

‍

Ending Tip

Even with the correct robots.txt configuration, your web server or firewall might still block AI crawlers. I recommend using Knowatoa's AI Search Console to streamline validate your setup - it'll check your site against 24 different AI user agents and flag any access issues.

Knowatoa's AI search console dashboard — Knowatoa's AI Search Console dashboard

Otherwise you can use Merkle's robots.txt tester to audit user agents one-by-one.

robots.txt Validator and Testing Tool from technicalseo.com‍

‍

<div class="post-note">Questions or missing agents? Let me know! I update this guide whenever new data rolls in. Drop me a comment on LinkedIn if you spot any I've missed!</div>

Originally published: February 7, 2025 | Updated: April 24, 2025 | Last updated: April 29, 2025

You're reading:

AI user-agents, bots, and crawlers to watch (April 2025 update)

Like it? Share it!

AI user-agents, bots, and crawlers to watch (April 2025 update)

Key takeaway

What changed since the March 2025 edition?

Quick definitions

Why you should care

Complete AI crawler list

Robots.txt examples that include every agent above

Robots.txt best‑practice checklist

Testing tips & code snippets

Field Notes from Jori Ford's, “Hybrid Engine Optimization” (SEO Week NYC 2025)

Quick-deploy checklist

Why this matters

Web based “all bots” check (UI)

How to spot check for AI crawlers in your server logs

Firewall template snippets

Nginx

Robots.txt boilerplate you can swap in

Two common patterns in robots.txt directives

Emerging “agentic” browser fetchers

FAQs

What is an AI crawler in robots.txt?

**Is `User-agent: *` enough?**

What is the best AI web crawler for open data?

What do you mean by "top user agents"?

Are bots required to follow directives in robots.txt files?

How do AI crawler bots fit into the picture of my target audience?

Ending Tip

About the author(s)

Tyler Einberger

Key takeaway

What changed since the March 2025 edition?

Quick definitions

Why you should care

Complete AI crawler list

Robots.txt examples that include every agent above

Robots.txt best‑practice checklist

Testing tips & code snippets

Field Notes from Jori Ford's, “Hybrid Engine Optimization” (SEO Week NYC 2025)

Quick-deploy checklist

Why this matters

Web based “all bots” check (UI)

How to spot check for AI crawlers in your server logs

Firewall template snippets

Nginx

Robots.txt boilerplate you can swap in

Two common patterns in robots.txt directives

Emerging “agentic” browser fetchers

FAQs

What is an AI crawler in robots.txt?

Is User-agent: * enough?

What is the best AI web crawler for open data?

What do you mean by "top user agents"?

Are bots required to follow directives in robots.txt files?

How do AI crawler bots fit into the picture of my target audience?

Ending Tip

About the author(s)

Tyler Einberger

Subscribe to our newsletter

We use cookies, but we don't sell your data. They just help us make our site work better for you.

**Is `User-agent: *` enough?**