Gary Illyes’ Post

View profile for Gary Illyes, graphic

Analyst at Google

You probably heard before that your robots.txt file MUST be at example․com/robots.txt. The Robots Exclusion Protocol is 30-years-old this year and I'm here to tell you that what you heard on the internet is not entirely true (shocker). You have a CDN and you have your main site. You have two robots.txt files, one at https∶//cdn․example․com/robots.txt and one at https∶//www․example․com/robots.txt. You could have just one central robots.txt with all the rules, say on your CDN, which might help you with keeping track of all rules you need to manage. All you have to do is to redirect https∶//www․example․com/robots.txt to https∶//cdn․example․com/robots.txt and crawlers that comply to RFC9309 will just use the redirect target as the robotstxt file of https∶//www․example․com/. Weird or what. Now I wonder if the parsed robots.txt file actually needs to be called robots.txt

Deborah Carver 🪩

Creator of The Content Technologist, web evangelist, and results-focused digital content strategy consultant

3w

If "what you heard on the internet is not entirely true," isn't it the responsibility of the company that creates the gateway to the web's traffic to create the source of truth or correct the record? It would be awesome to have a version of this for webmasters clearly available on Google's documentation for Robots dot text, so we don't have to hang on every word of Google employee social posts. Also, please rebrand "robots dot text" while you are at it. 😉

Shaikat Ray

Technical SEO at SelfCanonical.com

3w

Gary Illyes Up to 5 redirects, I believe? 🤔

Like
Reply
Joe Handaya

💼 Group Web, Blog & SEO Manager at Ruangguru | 📈 Independent Professional #SEO Consultant | 💘 Deeply in love with KabarGames

3w

hI Gary, is the same exact concept also applied on sitemap xml as well? Need your word on this case, coz im having trouble with my developer now haha Maybe John Mueller can enlighten me too, thanks in advance

Shalwal Singha

Empowering Digital Enablement for SMBs

3w

The timing of this post is mysteriously on point. Cheers and Thank You for this simple PoA.

Robots handles relative pathes and I haven't found any source, robots could work cross-domain. Im afraid, with redirection robots rules can become unvalid for location from where it is redirected. So I understand Google documentation on robots https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt

Like
Reply
Saeed Khosravi

Passionate about Driving Business Growth Online. Internet Marketing Strategist, SEO Enthusiast, MIB Graduate. Founder & CEO @ Nexunom, and Creator of ReviewTool.com, Allintitle.co, and Tavata.com

3w

Interesting. I also encountered sites that had redirected their robots.txt file while analyzing the robots.txt file of over 80K websites as part of my "Universal Web Crawler Blocking Report" research (https://saeedkhosravi.ca/universal-web-crawler-blocking-report/). At the time, I figured that the redirections were set as part of a complete 301 redirection of the site to a new domain, as I didn't know they might have redirected just the robots.txt file for the purpose of managing them from a central place. This is interesting as in my future updates, I can see how many of the sites in my pool actually redirect only their robots.txt files.

Like
Reply
Putra K

Data Center | Cloud | AI | IT Forensics | E-commerce | Cyber Security | DevOps | Full Stack Developer | Data Science | LinkedIn Strategist | Digital Marketer | Enterprise Search Engine Optimization | Content Writer

2w

The current handling of special characters in robots.txt files, as outlined in RFC 9309, presents some ambiguities and potential inefficiencies. The specification requires crawlers to support special characters such as #, $, and *, and to use percent-encoding for these characters in URIs. This approach can lead to inconsistencies and confusion, especially when interpreting patterns in robots.txt files. 

Like
Reply
Olivian-Claudiu Stoica

Senior SEO Specialist | Saas Enthusiast | Analytics Geek | Search Me on Google

2w

Both subdomains have the robots file in the same place (root) in your example: www.example.com/robots.txt & cdn.example.com/robots.txt. If you'd want to place the file in a folder like /robots/robots.txt, do you have to implement a redirect .com/robots.txt to /robots/robots.txt or not? I got confused after your first paragraph. Thanks!

Like
Reply

kinda interesting. I thought cdn and main site should have their own robots.txt due they are diferent domains

Adi Shwartzman

StorySEO - SEO Consultant I dont buy or sell links please STOP contact me about it

3w

I wonder if there is a way of avoiding keyword stuffing when explaining technical stuff, probably not

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics