Your robots.txt file is a small text file that sits at the root of your website and tells search engines and other bots which pages they can and cannot access. It's been around since the mid-1990s, and most business owners have never thought about it. But in the age of AI search, this tiny file has become one of the most important factors in whether AI systems can find and recommend your business.
If your robots.txt is blocking AI crawlers—and many are, either intentionally or by accident—you're invisible to a rapidly growing number of potential customers. This guide shows you exactly how to fix that.
What Is robots.txt and Why Does It Matter?
Every website has a robots.txt file (or should). You can see yours right now by typing your website URL followed by /robots.txt in your browser. For example: https://yourbusiness.com/robots.txt
Skyline Property Management in Denver, CO discovered the hard way how much robots.txt matters. Their WordPress security plugin had silently added rules blocking GPTBot, Claude-Web, and PerplexityBot—they had no idea. When a prospective tenant told them “I asked ChatGPT for the best property management companies in Denver and you didn't come up,” they investigated. After removing the AI crawler blocks and adding explicit Allow directives along with a sitemap reference, Skyline began appearing in AI-generated recommendations within weeks. They estimate the fix—a five-minute robots.txt edit—has driven at least 20 qualified landlord leads through AI search channels in the six months since.
See where your business stands right now. Run a free Sigma Score scan →
This file contains instructions for web crawlers—the automated programs that search engines and AI systems use to read and index your website. When Googlebot, GPTBot (OpenAI's crawler), or any other bot visits your site, it checks your robots.txt file first to see if it's allowed to access your content.
If your robots.txt says "don't crawl this site" to an AI bot, that AI will never see your content, never understand your business, and never recommend you to anyone.
The AI Crawlers You Need to Know About
Here are the most important AI crawlers and what they do:
- GPTBot — OpenAI's crawler, used to gather information for ChatGPT and ChatGPT Search
- ChatGPT-User — OpenAI's crawler specifically for real-time web browsing when users ask ChatGPT to search the web
- Claude-Web — Anthropic's crawler for the Claude AI assistant
- PerplexityBot — Perplexity AI's crawler for its AI-powered search engine
- Google-Extended — Google's crawler used for AI training and Gemini features (separate from regular Googlebot)
- Googlebot — Google's traditional search crawler (you almost certainly want this allowed)
The Recommended robots.txt Configuration
For most local businesses, you want to allow all major search engines and AI crawlers to access your site. Here's the recommended configuration:
Recommended robots.txt for Local Businesses
# Allow all standard search engine crawlers User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / # Allow AI crawlers User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: Claude-Web Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: / # Default rule for all other bots User-agent: * Allow: / # Block access to admin and private areas Disallow: /admin/ Disallow: /private/ Disallow: /wp-admin/ Disallow: /cart/ Disallow: /checkout/ # Point to your sitemap Sitemap: https://yourbusiness.com/sitemap.xml
Understanding the Syntax
robots.txt uses a simple format that anyone can understand:
- User-agent: specifies which bot the following rules apply to. Use
*for all bots. - Allow: / means "you can access everything starting from the root."
- Disallow: /admin/ means "you cannot access anything in the /admin/ directory."
- Sitemap: tells crawlers where to find your sitemap, making it easier for them to discover all your pages.
- # comments are notes for humans and are ignored by bots.
Common Mistakes to Avoid
Mistake 1: Blocking Everything by Default
Some websites have a robots.txt that looks like this:
DO NOT do this (blocks all bots)
User-agent: * Disallow: /
This tells every bot—Google, Bing, ChatGPT, Perplexity, all of them—to stay away from your entire site. If you find this in your robots.txt, change it immediately. This is sometimes left over from development or staging environments and can silently destroy your search visibility.
Mistake 2: Not Having a robots.txt at All
If your site doesn't have a robots.txt file, most bots will crawl everything by default. That's not terrible, but it means you're missing the opportunity to point crawlers to your sitemap and to block them from admin areas, shopping carts, and other pages that shouldn't be indexed.
Mistake 3: Blocking AI Crawlers Specifically
Some website templates and security plugins block AI crawlers by default. Check your robots.txt for lines like:
Check for these blocking rules
User-agent: GPTBot Disallow: / User-agent: Claude-Web Disallow: / User-agent: PerplexityBot Disallow: /
If you see rules like these, remove them unless you have a specific reason to block these bots. For local businesses, blocking AI crawlers means blocking potential customers.
How to Edit Your robots.txt File
WordPress
If you use Yoast SEO, go to Yoast SEO > Tools > File Editor to edit robots.txt directly from your dashboard. With Rank Math, go to Rank Math > General Settings > Edit robots.txt. You can also edit the file directly via FTP or your hosting control panel—it's located in the root directory of your website.
Shopify
Shopify automatically generates a robots.txt file. To customize it, go to Online Store > Themes > Edit Code and add a robots.txt.liquidfile in the Templates folder. Shopify's documentation provides the default template you can modify.
Custom/Static Sites
Create or edit the robots.txt file in the root directory of your website (the same place as your homepage index.html file). Upload it via FTP, your hosting file manager, or your deployment pipeline.
Selective Access: Allow AI for Content, Block for Training
Some business owners are concerned about AI companies using their content for training purposes. If you want AI search engines to find your business but prefer your content not be used for model training, you can use a more nuanced configuration:
Allow search, restrict training
# Allow GPTBot for search features User-agent: GPTBot Allow: / # Allow real-time ChatGPT browsing User-agent: ChatGPT-User Allow: / # Block training-specific crawlers User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: / # Allow all standard search User-agent: Googlebot Allow: / User-agent: Bingbot Allow: /
This approach lets AI search tools access your content for real-time recommendations while restricting crawlers primarily used for training data collection. The distinction isn't always clean, but it gives you more control.
Verify Your Configuration
After making changes, verify everything is working:
- Visit
yourbusiness.com/robots.txtin your browser to confirm your changes are live - Use Google Search Console's robots.txt tester to check for errors
- Run a free Sigma Score audit to see if AI crawlers can access your site
Not sure what your current setup looks like? Our Sigma Score audit checks your robots.txt configuration as part of its AI readiness evaluation. It will tell you exactly which crawlers are blocked and which are allowed. If you want us to handle the configuration for you, our optimization packages include robots.txt configuration for all major AI crawlers.
How Sigma Agents Applies This
At Sigma Agents, robots.txt configuration is one of the first things we check in every client engagement. Our AI visibility audit scans your robots.txt for blocking rules against all major AI crawlers—GPTBot, ChatGPT-User, Claude-Web, PerplexityBot, and Google-Extended—and flags any configurations that are silently preventing AI discovery. We have found that roughly 30% of the local business websites we audit have at least one AI crawler blocked, often without the owner's knowledge.
Beyond simply allowing access, we configure robots.txt as part of a broader AI accessibility strategy. This includes adding your sitemap URL, ensuring proper crawl directives for your most important pages, and pairing your robots.txt with an llms.txt file that gives AI systems a structured summary of your business. We also implement selective access rules for clients who want to allow AI search features while restricting training-only crawlers.
The robots.txt fix is often the fastest win in our entire optimization process. A five-minute configuration change can open the door to an entirely new customer acquisition channel. We include ongoing monitoring to ensure future website updates, plugin changes, or platform migrations do not inadvertently re-block the crawlers that are sending you business.
Ready to put this into action?
Book a free strategy call →The Bottom Line
Your robots.txt file is a 30-second change that can dramatically impact whether AI systems can find and recommend your business. Check yours today, make sure AI crawlers are allowed, and point them to your sitemap so they can efficiently discover all your pages.
It's one of the simplest and most impactful technical changes you can make for your business's online visibility—and unlike many SEO tasks, you only need to do it once.