Technical SEOFeb 20, 2025·7 min read

How to Configure robots.txt for AI Crawlers

Your robots.txt file controls which AI systems can access your content. Learn how to configure it for GPTBot, Claude-Web, PerplexityBot, and more.

SA

Sigma Agents Team

sigmaagents.ai

How to Configure robots.txt for AI Crawlers
Want to see how your business stacks up? Scan your site free →

Your robots.txt file is a small text file that sits at the root of your website and tells search engines and other bots which pages they can and cannot access. It's been around since the mid-1990s, and most business owners have never thought about it. But in the age of AI search, this tiny file has become one of the most important factors in whether AI systems can find and recommend your business.

If your robots.txt is blocking AI crawlers—and many are, either intentionally or by accident—you're invisible to a rapidly growing number of potential customers. This guide shows you exactly how to fix that.

What Is robots.txt and Why Does It Matter?

Every website has a robots.txt file (or should). You can see yours right now by typing your website URL followed by /robots.txt in your browser. For example: https://yourbusiness.com/robots.txt

Skyline Property Management in Denver, CO discovered the hard way how much robots.txt matters. Their WordPress security plugin had silently added rules blocking GPTBot, Claude-Web, and PerplexityBot—they had no idea. When a prospective tenant told them “I asked ChatGPT for the best property management companies in Denver and you didn't come up,” they investigated. After removing the AI crawler blocks and adding explicit Allow directives along with a sitemap reference, Skyline began appearing in AI-generated recommendations within weeks. They estimate the fix—a five-minute robots.txt edit—has driven at least 20 qualified landlord leads through AI search channels in the six months since.

See where your business stands right now. Run a free Sigma Score scan →

This file contains instructions for web crawlers—the automated programs that search engines and AI systems use to read and index your website. When Googlebot, GPTBot (OpenAI's crawler), or any other bot visits your site, it checks your robots.txt file first to see if it's allowed to access your content.

If your robots.txt says "don't crawl this site" to an AI bot, that AI will never see your content, never understand your business, and never recommend you to anyone.

The AI Crawlers You Need to Know About

Here are the most important AI crawlers and what they do:

  • GPTBot — OpenAI's crawler, used to gather information for ChatGPT and ChatGPT Search
  • ChatGPT-User — OpenAI's crawler specifically for real-time web browsing when users ask ChatGPT to search the web
  • Claude-Web — Anthropic's crawler for the Claude AI assistant
  • PerplexityBot — Perplexity AI's crawler for its AI-powered search engine
  • Google-Extended — Google's crawler used for AI training and Gemini features (separate from regular Googlebot)
  • Googlebot — Google's traditional search crawler (you almost certainly want this allowed)

The Recommended robots.txt Configuration

For most local businesses, you want to allow all major search engines and AI crawlers to access your site. Here's the recommended configuration:

Recommended robots.txt for Local Businesses

# Allow all standard search engine crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Allow AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

# Default rule for all other bots
User-agent: *
Allow: /

# Block access to admin and private areas
Disallow: /admin/
Disallow: /private/
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /checkout/

# Point to your sitemap
Sitemap: https://yourbusiness.com/sitemap.xml

Understanding the Syntax

robots.txt uses a simple format that anyone can understand:

  • User-agent: specifies which bot the following rules apply to. Use * for all bots.
  • Allow: / means "you can access everything starting from the root."
  • Disallow: /admin/ means "you cannot access anything in the /admin/ directory."
  • Sitemap: tells crawlers where to find your sitemap, making it easier for them to discover all your pages.
  • # comments are notes for humans and are ignored by bots.

Common Mistakes to Avoid

Mistake 1: Blocking Everything by Default

Some websites have a robots.txt that looks like this:

DO NOT do this (blocks all bots)

User-agent: *
Disallow: /

This tells every bot—Google, Bing, ChatGPT, Perplexity, all of them—to stay away from your entire site. If you find this in your robots.txt, change it immediately. This is sometimes left over from development or staging environments and can silently destroy your search visibility.

Mistake 2: Not Having a robots.txt at All

If your site doesn't have a robots.txt file, most bots will crawl everything by default. That's not terrible, but it means you're missing the opportunity to point crawlers to your sitemap and to block them from admin areas, shopping carts, and other pages that shouldn't be indexed.

Mistake 3: Blocking AI Crawlers Specifically

Some website templates and security plugins block AI crawlers by default. Check your robots.txt for lines like:

Check for these blocking rules

User-agent: GPTBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: PerplexityBot
Disallow: /

If you see rules like these, remove them unless you have a specific reason to block these bots. For local businesses, blocking AI crawlers means blocking potential customers.

How to Edit Your robots.txt File

WordPress

If you use Yoast SEO, go to Yoast SEO > Tools > File Editor to edit robots.txt directly from your dashboard. With Rank Math, go to Rank Math > General Settings > Edit robots.txt. You can also edit the file directly via FTP or your hosting control panel—it's located in the root directory of your website.

Shopify

Shopify automatically generates a robots.txt file. To customize it, go to Online Store > Themes > Edit Code and add a robots.txt.liquidfile in the Templates folder. Shopify's documentation provides the default template you can modify.

Custom/Static Sites

Create or edit the robots.txt file in the root directory of your website (the same place as your homepage index.html file). Upload it via FTP, your hosting file manager, or your deployment pipeline.

Selective Access: Allow AI for Content, Block for Training

Some business owners are concerned about AI companies using their content for training purposes. If you want AI search engines to find your business but prefer your content not be used for model training, you can use a more nuanced configuration:

Allow search, restrict training

# Allow GPTBot for search features
User-agent: GPTBot
Allow: /

# Allow real-time ChatGPT browsing
User-agent: ChatGPT-User
Allow: /

# Block training-specific crawlers
User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

# Allow all standard search
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

This approach lets AI search tools access your content for real-time recommendations while restricting crawlers primarily used for training data collection. The distinction isn't always clean, but it gives you more control.

Verify Your Configuration

After making changes, verify everything is working:

  • Visit yourbusiness.com/robots.txt in your browser to confirm your changes are live
  • Use Google Search Console's robots.txt tester to check for errors
  • Run a free Sigma Score audit to see if AI crawlers can access your site

Not sure what your current setup looks like? Our Sigma Score audit checks your robots.txt configuration as part of its AI readiness evaluation. It will tell you exactly which crawlers are blocked and which are allowed. If you want us to handle the configuration for you, our optimization packages include robots.txt configuration for all major AI crawlers.

How Sigma Agents Applies This

At Sigma Agents, robots.txt configuration is one of the first things we check in every client engagement. Our AI visibility audit scans your robots.txt for blocking rules against all major AI crawlers—GPTBot, ChatGPT-User, Claude-Web, PerplexityBot, and Google-Extended—and flags any configurations that are silently preventing AI discovery. We have found that roughly 30% of the local business websites we audit have at least one AI crawler blocked, often without the owner's knowledge.

Beyond simply allowing access, we configure robots.txt as part of a broader AI accessibility strategy. This includes adding your sitemap URL, ensuring proper crawl directives for your most important pages, and pairing your robots.txt with an llms.txt file that gives AI systems a structured summary of your business. We also implement selective access rules for clients who want to allow AI search features while restricting training-only crawlers.

The robots.txt fix is often the fastest win in our entire optimization process. A five-minute configuration change can open the door to an entirely new customer acquisition channel. We include ongoing monitoring to ensure future website updates, plugin changes, or platform migrations do not inadvertently re-block the crawlers that are sending you business.

Ready to put this into action?

Book a free strategy call →

The Bottom Line

Your robots.txt file is a 30-second change that can dramatically impact whether AI systems can find and recommend your business. Check yours today, make sure AI crawlers are allowed, and point them to your sitemap so they can efficiently discover all your pages.

It's one of the simplest and most impactful technical changes you can make for your business's online visibility—and unlike many SEO tasks, you only need to do it once.

Robots.txtAI CrawlersTechnical SEOConfiguration

Check Your AI Visibility Score

See how your website performs across SEO, Local SEO, and AEO/GEO with our free Sigma Score scanner.

Why Choose Sigma Agents

Built by the team behind hipaaagent.ai — Google "hipaa agent" and see AI cite our work directly. That is Answer Engine Optimization in action, and it is exactly what we build for local businesses.

AI-First Approach

We optimize for ChatGPT, Gemini, Perplexity, and Google AI Overviews — not just traditional search.

One-Time Pricing

No monthly retainers. Pay once for infrastructure that works for you permanently.

Verifiable Results

Google 'hipaa agent' right now. Watch AI cite our client's site. That proof is live 24/7.

Local Focus

We serve 50 industries across all 50 states with city-level targeting and local market expertise.