Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file tells search engine crawlers which parts of your site they may or may not access. It lives at the root of your domain and is the first file most crawlers fetch when they visit.

Question 2

Where should I place my robots.txt file?

Accepted Answer

It must be at the root of your domain — for example, https://example.com/robots.txt. Subdirectory locations are ignored. Each subdomain needs its own robots.txt (e.g. blog.example.com/robots.txt is separate from example.com/robots.txt).

Question 3

What's the difference between Allow and Disallow?

Accepted Answer

Disallow tells a crawler not to fetch the listed paths. Allow explicitly permits paths that would otherwise match a Disallow rule — useful for carving out exceptions. An empty Disallow: line means allow everything.

Question 4

How do I block AI crawlers from scraping my site?

Accepted Answer

Use the Block AI crawlers preset above. It adds Disallow: / rules for GPTBot, ClaudeBot, CCBot, PerplexityBot, Google-Extended, Amazonbot and other major AI training crawlers. Note that not all AI bots honor robots.txt — combine with server-side blocks for stronger protection.

Question 5

Can robots.txt fully block a search engine from indexing my pages?

Accepted Answer

No. robots.txt controls crawling, not indexing. A page blocked in robots.txt can still appear in search results if other pages link to it. To prevent indexing, use a noindex meta tag or HTTP header instead.

Question 6

Do I need a robots.txt file if I want everything crawled?

Accepted Answer

No. If you have no robots.txt file, crawlers assume everything is allowed. A robots.txt is only required when you want to restrict access or point crawlers to your sitemap.

Question 7

How do I test if my robots.txt works?

Accepted Answer

Upload the file to your domain root and visit yourdomain.com/robots.txt in a browser. For deeper testing, use Google Search Console's robots.txt Tester or Bing Webmaster Tools. They show exactly which URLs are blocked or allowed for specific crawlers.

robots.txt Generator

User-agent rules

Sitemaps

Frequently asked questions