Texterfly

Robots.txt Generator

Generate, validate, and download a robots.txt file for any platform. Configure rules for 20+ bots including all major search engines and AI training crawlers. Includes a bonus llms.txt generator.

All Major BotsAI Bot BlockingLive Validation6 Presetsllms.txt GeneratorDownload File100% Free
Quick Presets
Bot Rules

🔍 Search Engines

Googlebot

Google — web search crawler

Googlebot Image

Google — image search crawler

Bingbot

Microsoft Bing search crawler

Yahoo! Slurp

Yahoo search crawler

DuckDuckBot

DuckDuckGo search crawler

Baiduspider

Baidu (Chinese) search crawler

YandexBot

Yandex (Russian) search crawler

🤖 AI Crawlers

GPTBot (OpenAI)

OpenAI ChatGPT training crawler

ChatGPT User

OpenAI ChatGPT browsing plugin

CCBot (Common Crawl)

Common Crawl — AI training dataset

Claude (Anthropic)

Anthropic Claude training crawler

ClaudeBot

Anthropic Claude crawler

Google Bard/Gemini

Google AI model training crawler

Meta AI Crawler

Meta AI model training crawler

PerplexityBot

Perplexity AI search crawler

ByteSpider (TikTok)

ByteDance/TikTok AI training crawler

Cohere AI

Cohere AI training crawler

omgili / Webz.io

AI data collection & scraping service

📣 Social Media

Facebook Bot

Facebook link preview crawler

Twitterbot

Twitter/X link preview crawler

LinkedInBot

LinkedIn link preview crawler

📢 Ad Networks

AdSense Bot

Google AdSense content analyser

AdsBot Google

Google Ads quality checker

Path Rules
Sitemap & Options
robots.txt Output

Updates live as you change settings. Place this file at yourdomain.com/robots.txt

What Is a robots.txt File?

A robots.txt file is a plain-text file placed at the root of your domain that tells web crawlers which pages or directories they can and cannot access. It is the first file any well-behaved bot reads when it visits your site. Getting it right is one of the most impactful technical SEO tasks you can do.

# Minimal valid robots.txt

User-agent: *

Allow: /

Disallow: /admin/

Sitemap: https://yourdomain.com/sitemap.xml

🔍

Control Crawl Budget

Tell Googlebot not to waste time crawling admin panels, staging pages, or search result pages — so it spends more time on your important content.

🤖

Block AI Training Bots

GPTBot, CCBot, Claude, Gemini — AI companies send crawlers to train their models on your content. Block them individually or all at once.

🗺️

Declare Your Sitemap

Adding a Sitemap: line helps all secondary crawlers discover your content structure without needing to submit to each search engine separately.

AI Bot Reference — User-Agent Strings

Use these exact strings in your robots.txt User-agent: directives to target specific AI crawlers.

User-agentCompanyPurposeBlock with
GPTBotOpenAIChatGPT training dataDisallow: /
ChatGPT-UserOpenAIChatGPT browsing pluginDisallow: /
CCBotCommon CrawlAI training datasetsDisallow: /
anthropic-aiAnthropicClaude model trainingDisallow: /
ClaudeBotAnthropicClaude crawlingDisallow: /
Google-ExtendedGoogleBard/Gemini AI trainingDisallow: /
PerplexityBotPerplexityAI search trainingDisallow: /
BytespiderByteDanceTikTok AI trainingDisallow: /
FacebookBotMetaMeta AI model trainingDisallow: /
cohere-aiCohereCohere model trainingDisallow: /

robots.txt Syntax Reference

DirectiveExampleMeaning
User-agent: *User-agent: *Applies to all crawlers
User-agent: GooglebotUser-agent: GooglebotApplies only to Googlebot
Disallow: /path/Disallow: /admin/Block this path and everything under it
Disallow: /Disallow: /Block the entire site
Disallow:Disallow:Empty = allow everything (same as Allow: /)
Allow: /path/Allow: /public/Explicitly allow (overrides a broader Disallow)
Disallow: /*?*Disallow: /*?*filter=*Wildcard — blocks URLs with any query param
Disallow: /*.pdf$Disallow: /*.pdf$$ = end of URL — blocks all .pdf files
Crawl-delay: NCrawl-delay: 10Wait N seconds between requests (not Google)
Sitemap: URLSitemap: https://x.com/sitemap.xmlDeclare your XML sitemap location

How to Deploy Your robots.txt

  1. 1

    Generate and download

    Use the generator above to configure your rules, then click Download to get your robots.txt file.

  2. 2

    Upload to your site root

    Place the file in your website's root directory — the same folder that contains your index.html or homepage. It must be accessible at yourdomain.com/robots.txt (not /assets/robots.txt).

  3. 3

    Verify it's live

    Open your browser and navigate to https://yourdomain.com/robots.txt. You should see the plain text content of your file.

  4. 4

    Notify Google

    Go to Google Search Console → Settings → robots.txt and click Submit. This tells Google to re-read your file immediately instead of waiting for its ~24hr cache.

Frequently Asked Questions

What is a robots.txt generator?
A robots.txt generator is a free online tool that creates a correctly formatted robots.txt file by letting you configure rules through a visual interface rather than writing the syntax by hand. It eliminates formatting errors that could accidentally block Googlebot from your site.
Does robots.txt stop my page from appearing in Google?
No — robots.txt prevents crawling, not indexing. If your page is linked to from external sites, Google may still index it without crawling it. To guarantee removal from search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.
Can I block AI bots like ChatGPT from my website?
Yes. OpenAI's GPTBot, Anthropic's ClaudeBot, Google-Extended, CCBot, and others all respect robots.txt. Use the AI Crawlers section above to select which ones to block with Disallow: /.
What is the difference between robots.txt and noindex?
robots.txt controls access (crawling) — it tells bots whether they can visit a URL. noindex controls indexing — it tells Google not to show the page in search results. A page can be crawled but not indexed (noindex), or blocked from crawling but still indexed if linked from elsewhere.
Where exactly should I put my robots.txt file?
It must be in your website's root directory, accessible at https://yourdomain.com/robots.txt. It cannot be in a subdirectory. Each subdomain needs its own robots.txt (e.g. blog.yourdomain.com/robots.txt).
Is robots.txt case sensitive?
Yes. Directives (User-agent, Allow, Disallow, Sitemap) must be capitalised exactly as shown. Path values are also case-sensitive on Linux servers — /Admin is different from /admin.
What is llms.txt and should I create one?
llms.txt is an emerging standard that declares to AI language models how they may use your content — whether training is permitted and who to contact for licensing. It is not yet universally enforced but is increasingly respected by major AI providers. Creating one now establishes your rights posture early.
How long does it take for Google to see robots.txt changes?
Google caches robots.txt for up to 24 hours. You can force an immediate re-read by using the Robots.txt tester in Google Search Console and clicking Submit.
Should I disallow CSS and JavaScript files?
No — never. Google needs to render your CSS and JavaScript to understand your page layout and content. Blocking these files was a common mistake years ago and is still seen causing ranking drops. Modern robots.txt best practice is to leave all CSS/JS files fully accessible.
What is Crawl-delay and does Google respect it?
Crawl-delay tells bots to wait N seconds between requests to reduce server load. Bing, Yandex, and other bots respect it. Google does not — to control Google's crawl rate, use the Crawl Rate settings in Google Search Console.

Explore All Tools

82 free tools — no signup required

All 82 tools are free · No signup · No ads