Robots.txt Generator
Generate, validate, and download a robots.txt file for any platform. Configure rules for 20+ bots including all major search engines and AI training crawlers. Includes a bonus llms.txt generator.
🔍 Search Engines
Googlebot
Google — web search crawler
Googlebot Image
Google — image search crawler
Bingbot
Microsoft Bing search crawler
Yahoo! Slurp
Yahoo search crawler
DuckDuckBot
DuckDuckGo search crawler
Baiduspider
Baidu (Chinese) search crawler
YandexBot
Yandex (Russian) search crawler
🤖 AI Crawlers
GPTBot (OpenAI)
OpenAI ChatGPT training crawler
ChatGPT User
OpenAI ChatGPT browsing plugin
CCBot (Common Crawl)
Common Crawl — AI training dataset
Claude (Anthropic)
Anthropic Claude training crawler
ClaudeBot
Anthropic Claude crawler
Google Bard/Gemini
Google AI model training crawler
Meta AI Crawler
Meta AI model training crawler
PerplexityBot
Perplexity AI search crawler
ByteSpider (TikTok)
ByteDance/TikTok AI training crawler
Cohere AI
Cohere AI training crawler
omgili / Webz.io
AI data collection & scraping service
📣 Social Media
Facebook Bot
Facebook link preview crawler
Twitterbot
Twitter/X link preview crawler
LinkedInBot
LinkedIn link preview crawler
📢 Ad Networks
AdSense Bot
Google AdSense content analyser
AdsBot Google
Google Ads quality checker
Updates live as you change settings. Place this file at yourdomain.com/robots.txt
Support Our Free Tools
If you find this calculator helpful, please consider supporting our work. Your contribution helps us build and maintain these free tools for everyone.
Buy me a coffeeWhat Is a robots.txt File?
A robots.txt file is a plain-text file placed at the root of your domain that tells web crawlers which pages or directories they can and cannot access. It is the first file any well-behaved bot reads when it visits your site. Getting it right is one of the most impactful technical SEO tasks you can do.
# Minimal valid robots.txt
User-agent: *
Allow: /
Disallow: /admin/
Sitemap: https://yourdomain.com/sitemap.xml
Control Crawl Budget
Tell Googlebot not to waste time crawling admin panels, staging pages, or search result pages — so it spends more time on your important content.
Block AI Training Bots
GPTBot, CCBot, Claude, Gemini — AI companies send crawlers to train their models on your content. Block them individually or all at once.
Declare Your Sitemap
Adding a Sitemap: line helps all secondary crawlers discover your content structure without needing to submit to each search engine separately.
AI Bot Reference — User-Agent Strings
Use these exact strings in your robots.txt User-agent: directives to target specific AI crawlers.
| User-agent | Company | Purpose | Block with |
|---|---|---|---|
| GPTBot | OpenAI | ChatGPT training data | Disallow: / |
| ChatGPT-User | OpenAI | ChatGPT browsing plugin | Disallow: / |
| CCBot | Common Crawl | AI training datasets | Disallow: / |
| anthropic-ai | Anthropic | Claude model training | Disallow: / |
| ClaudeBot | Anthropic | Claude crawling | Disallow: / |
| Google-Extended | Bard/Gemini AI training | Disallow: / | |
| PerplexityBot | Perplexity | AI search training | Disallow: / |
| Bytespider | ByteDance | TikTok AI training | Disallow: / |
| FacebookBot | Meta | Meta AI model training | Disallow: / |
| cohere-ai | Cohere | Cohere model training | Disallow: / |
robots.txt Syntax Reference
| Directive | Example | Meaning |
|---|---|---|
| User-agent: * | User-agent: * | Applies to all crawlers |
| User-agent: Googlebot | User-agent: Googlebot | Applies only to Googlebot |
| Disallow: /path/ | Disallow: /admin/ | Block this path and everything under it |
| Disallow: / | Disallow: / | Block the entire site |
| Disallow: | Disallow: | Empty = allow everything (same as Allow: /) |
| Allow: /path/ | Allow: /public/ | Explicitly allow (overrides a broader Disallow) |
| Disallow: /*?* | Disallow: /*?*filter=* | Wildcard — blocks URLs with any query param |
| Disallow: /*.pdf$ | Disallow: /*.pdf$ | $ = end of URL — blocks all .pdf files |
| Crawl-delay: N | Crawl-delay: 10 | Wait N seconds between requests (not Google) |
| Sitemap: URL | Sitemap: https://x.com/sitemap.xml | Declare your XML sitemap location |
How to Deploy Your robots.txt
- 1
Generate and download
Use the generator above to configure your rules, then click Download to get your robots.txt file.
- 2
Upload to your site root
Place the file in your website's root directory — the same folder that contains your index.html or homepage. It must be accessible at yourdomain.com/robots.txt (not /assets/robots.txt).
- 3
Verify it's live
Open your browser and navigate to https://yourdomain.com/robots.txt. You should see the plain text content of your file.
- 4
Notify Google
Go to Google Search Console → Settings → robots.txt and click Submit. This tells Google to re-read your file immediately instead of waiting for its ~24hr cache.
Frequently Asked Questions
- What is a robots.txt generator?
- A robots.txt generator is a free online tool that creates a correctly formatted robots.txt file by letting you configure rules through a visual interface rather than writing the syntax by hand. It eliminates formatting errors that could accidentally block Googlebot from your site.
- Does robots.txt stop my page from appearing in Google?
- No — robots.txt prevents crawling, not indexing. If your page is linked to from external sites, Google may still index it without crawling it. To guarantee removal from search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.
- Can I block AI bots like ChatGPT from my website?
- Yes. OpenAI's GPTBot, Anthropic's ClaudeBot, Google-Extended, CCBot, and others all respect robots.txt. Use the AI Crawlers section above to select which ones to block with Disallow: /.
- What is the difference between robots.txt and noindex?
- robots.txt controls access (crawling) — it tells bots whether they can visit a URL. noindex controls indexing — it tells Google not to show the page in search results. A page can be crawled but not indexed (noindex), or blocked from crawling but still indexed if linked from elsewhere.
- Where exactly should I put my robots.txt file?
- It must be in your website's root directory, accessible at https://yourdomain.com/robots.txt. It cannot be in a subdirectory. Each subdomain needs its own robots.txt (e.g. blog.yourdomain.com/robots.txt).
- Is robots.txt case sensitive?
- Yes. Directives (User-agent, Allow, Disallow, Sitemap) must be capitalised exactly as shown. Path values are also case-sensitive on Linux servers — /Admin is different from /admin.
- What is llms.txt and should I create one?
- llms.txt is an emerging standard that declares to AI language models how they may use your content — whether training is permitted and who to contact for licensing. It is not yet universally enforced but is increasingly respected by major AI providers. Creating one now establishes your rights posture early.
- How long does it take for Google to see robots.txt changes?
- Google caches robots.txt for up to 24 hours. You can force an immediate re-read by using the Robots.txt tester in Google Search Console and clicking Submit.
- Should I disallow CSS and JavaScript files?
- No — never. Google needs to render your CSS and JavaScript to understand your page layout and content. Blocking these files was a common mistake years ago and is still seen causing ranking drops. Modern robots.txt best practice is to leave all CSS/JS files fully accessible.
- What is Crawl-delay and does Google respect it?
- Crawl-delay tells bots to wait N seconds between requests to reduce server load. Bing, Yandex, and other bots respect it. Google does not — to control Google's crawl rate, use the Crawl Rate settings in Google Search Console.
Explore All Tools
82 free tools — no signup required
All 82 tools are free · No signup · No ads
