Presets:

Many AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) voluntarily honor robots.txt, but some ignore it. For stronger protection, combine robots.txt with server-side blocks or the X-Robots-Tag HTTP header.

User-agent rules

Sitemaps

Live preview

          

Frequently asked questions

What is a robots.txt file?

A robots.txt file tells search engine crawlers which parts of your site they may or may not access. It lives at the root of your domain and is the first file most crawlers fetch when they visit.

Where should I place my robots.txt file?

It must be at the root of your domain — for example, https://example.com/robots.txt. Subdirectory locations are ignored. Each subdomain needs its own robots.txt (e.g. blog.example.com/robots.txt is separate from example.com/robots.txt).

What's the difference between Allow and Disallow?

Disallow tells a crawler not to fetch the listed paths. Allow explicitly permits paths that would otherwise match a Disallow rule — useful for carving out exceptions. An empty Disallow: line means allow everything.

How do I block AI crawlers from scraping my site?

Use the Block AI crawlers preset above. It adds Disallow: / rules for GPTBot, ClaudeBot, CCBot, PerplexityBot, Google-Extended, Amazonbot and other major AI training crawlers. Note that not all AI bots honor robots.txt — combine with server-side blocks for stronger protection.

Can robots.txt fully block a search engine from indexing my pages?

No. robots.txt controls crawling, not indexing. A page blocked in robots.txt can still appear in search results if other pages link to it. To prevent indexing, use a noindex meta tag or HTTP header instead.

Do I need a robots.txt file if I want everything crawled?

No. If you have no robots.txt file, crawlers assume everything is allowed. A robots.txt is only required when you want to restrict access or point crawlers to your sitemap.

How do I test if my robots.txt works?

Upload the file to your domain root and visit yourdomain.com/robots.txt in a browser. For deeper testing, use Google Search Console's robots.txt Tester or Bing Webmaster Tools. They show exactly which URLs are blocked or allowed for specific crawlers.