Gary Illyes’ Post

Analyst at Google

You probably heard before that your robots.txt file MUST be at example․com/robots.txt. The Robots Exclusion Protocol is 30-years-old this year and I'm here to tell you that what you heard on the internet is not entirely true (shocker). You have a CDN and you have your main site. You have two robots.txt files, one at https∶//cdn․example․com/robots.txt and one at https∶//www․example․com/robots.txt. You could have just one central robots.txt with all the rules, say on your CDN, which might help you with keeping track of all rules you need to manage. All you have to do is to redirect https∶//www․example․com/robots.txt to https∶//cdn․example․com/robots.txt and crawlers that comply to RFC9309 will just use the redirect target as the robotstxt file of https∶//www․example․com/. Weird or what. Now I wonder if the parsed robots.txt file actually needs to be called robots.txt

34 Comments

Deborah Carver 🪩

Creator of The Content Technologist, web evangelist, and results-focused digital content strategy consultant

If "what you heard on the internet is not entirely true," isn't it the responsibility of the company that creates the gateway to the web's traffic to create the source of truth or correct the record? It would be awesome to have a version of this for webmasters clearly available on Google's documentation for Robots dot text, so we don't have to hang on every word of Google employee social posts. Also, please rebrand "robots dot text" while you are at it. 😉

9 Reactions

Shaikat Ray

Technical SEO at SelfCanonical.com

Gary Illyes Up to 5 redirects, I believe? 🤔

Joe Handaya

💼 Group Web, Blog & SEO Manager at Ruangguru | 📈 Independent Professional #SEO Consultant | 💘 Deeply in love with KabarGames

hI Gary, is the same exact concept also applied on sitemap xml as well? Need your word on this case, coz im having trouble with my developer now haha Maybe John Mueller can enlighten me too, thanks in advance

1 Reaction

Shalwal Singha

Empowering Digital Enablement for SMBs

The timing of this post is mysteriously on point. Cheers and Thank You for this simple PoA.

1 Reaction

Evgeniy Orlov

Robots handles relative pathes and I haven't found any source, robots could work cross-domain. Im afraid, with redirection robots rules can become unvalid for location from where it is redirected. So I understand Google documentation on robots https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt

Saeed Khosravi

Passionate about Driving Business Growth Online. Internet Marketing Strategist, SEO Enthusiast, MIB Graduate. Founder & CEO @ Nexunom, and Creator of ReviewTool.com, Allintitle.co, and Tavata.com

Interesting. I also encountered sites that had redirected their robots.txt file while analyzing the robots.txt file of over 80K websites as part of my "Universal Web Crawler Blocking Report" research (https://saeedkhosravi.ca/universal-web-crawler-blocking-report/). At the time, I figured that the redirections were set as part of a complete 301 redirection of the site to a new domain, as I didn't know they might have redirected just the robots.txt file for the purpose of managing them from a central place. This is interesting as in my future updates, I can see how many of the sites in my pool actually redirect only their robots.txt files.

Putra K

The current handling of special characters in robots.txt files, as outlined in RFC 9309, presents some ambiguities and potential inefficiencies. The specification requires crawlers to support special characters such as #, $, and *, and to use percent-encoding for these characters in URIs. This approach can lead to inconsistencies and confusion, especially when interpreting patterns in robots.txt files.

Olivian-Claudiu Stoica

Senior SEO Specialist | Saas Enthusiast | Analytics Geek | Search Me on Google

Both subdomains have the robots file in the same place (root) in your example: www.example.com/robots.txt & cdn.example.com/robots.txt. If you'd want to place the file in a folder like /robots/robots.txt, do you have to implement a redirect .com/robots.txt to /robots/robots.txt or not? I got confused after your first paragraph. Thanks!

Carlos Sánchez Donate

El SEO Técnico

kinda interesting. I thought cdn and main site should have their own robots.txt due they are diferent domains

1 Reaction

Adi Shwartzman

StorySEO - SEO Consultant I dont buy or sell links please STOP contact me about it

I wonder if there is a way of avoiding keyword stuffing when explaining technical stuff, probably not

See more comments

To view or add a comment, sign in

More Relevant Posts

Dave Smart

Technical SEO at Tame the Bots
3mo Edited
Report this post
In today's episode of strange things you can, but probably shouldn't do with robots.txt. - You can redirect /robots.txt to a different URL, even on a different hostname, 301, 302, 307 & 308 all work. e.g. rbtredirect.tamethebots[.]com/robots.txt > tamethebots[.].com/squashed-egg.txt - You can't block robots.txt from working as with a Disallow: /robots.txt But you can add Disallow: /squashed-egg.txt in tamethebots[.].com/robots.txt - At that point you can get the rare robots.txt blocked by robots.txt warning and crawling behaves as if you have no robots.txt Imagine debugging that one if you didn't know someone set it up that way.
20 Comments
Like Comment
To view or add a comment, sign in
Machines MDPI

2,526 followers
7mo
Report this post
📢 Read Notable Review Paper in 2021–2022 A Review of Path-Planning Approaches for Multiple Mobile Robots by Shiwei Lin, Ang Liu, Jianguo Wang and Xiaoying Kong full text: https://lnkd.in/gJ_igcjj #mobilerobots #pathplanning #robot
Like Comment
To view or add a comment, sign in
InApps Technology

3,311 followers
2mo
Report this post
🤖 I'll admit, robots can be a little intimidating sometimes. But they're actually making our lives better in so many ways! 👉 Want to see how? Here are 9 examples of robots that could become part of your daily routine: https://lnkd.in/d4KEMKgv #robot #softwaredevelopment #inapps #innovation
Like Comment
To view or add a comment, sign in
Naveed Abbas

Technical SEO Analyst Boosting Revenue In $200Million for Online Businesses | Top 20 PPC Expert | SMM Expert
2mo
Report this post
You should check out the Robots Exclusion Checker Chrome Extension. When you visit a page, you can see if it's blocked by Robots.txt, Meta Robots, X-Robots, and what the canonical is doing. Without clicking on it, you can see the color coding for whether or not you need to take a deeper look (green for indexable, yellow for canonical pointing to different URLs, red for blocked by robots). It's helped me identify major issues with sites on multiple occasions.
Like Comment
To view or add a comment, sign in
Brad Templeton

Speaker/Consultant on Robocars and Exponential Technology btm@4brad.com
2mo
Report this post
Serve Robotics Goes Public With Ad Plan; What Do Ads On Robots Mean?: Some day robots may roam the streets without cargo, just showing ads. What will we do about it? http://dlvr.it/T67PDT
Like Comment
To view or add a comment, sign in
Oksana Salvarovska

Technical SEO, International SEO
7mo
Report this post
Why you SHOULD NOT block the page from being indexed using both robots.txt and noindex tags at the same time? ☑ Robots.txt blocks the page from being crawled, but it can be indexed and still appear in search results (That's why you can see in GSC "indexed, though blocked in robots.txt") ✅ Noindex tag blocks the page from being indexed, but it still can be crawled. When you use both - the bot will not be able to crawl the page and find the noindex tag, so the URL will appear in SERP. You can find more info in the official guidelines, but we don't like to read all the docs until we face the issue 😜

12 Comments
Like Comment
To view or add a comment, sign in
Sanjeev Rastogi 🇮🇳

Manager - SEO/ASO/SEM @ Interactive Avenues (5k+ Followers) || Ex- OneFridayWorld | News/ Education/ E-commerce/ Lifestyle
8mo
Report this post
Google: Very Few Robots.txt Files Are Over 500KB #Google #Robots.txt #Robotsfile #500kb

Google: Very Few Robots.txt Files Are Over 500KB

seroundtable.com
Like Comment
To view or add a comment, sign in
Petko Aleksandrov

CEO, EA FOREX ACADEMY LTD.
4mo Edited
Report this post
✅ I am testing 100s of robots, and now the results from my Challenges and Funded accounts are public! 👉 https://lnkd.in/gT3wq7BT ❓So, which robot would you like me to try next? #robot #tradingrobot #challenges #challenge #expertadvisor #ea
3 Comments
Like Comment
To view or add a comment, sign in
Manjunath Mahashetti

Principal Engineer, Digital Identity at Totmtechnologies
1y
Report this post
A significiant part of energy and processing power is spent in maintaining balance across the joints aping Homo Erectus(upright human), compare this to stable quadruped or Wheeled robots! https://lnkd.in/gsb-SCfs

It’s Totally Fine for Humanoid Robots to Fall Down

spectrum.ieee.org
Like Comment
To view or add a comment, sign in
RoboDK

64,214 followers
10mo
Report this post
*NEW* RoboDK is on fire! After releasing the #TwinTrack Probe Design and its #RoboDKVirtualAssistant earlier this month, find now the new robot #driver for Hanwha Group! Update RoboDK to have access: https://ow.ly/5nFt50PNhj6 Supported #Hanwha robots: https://ow.ly/2R5850PNhj5 #robodk #robotdriver #robotics #robot #manufacturing #mfg #simulationsoftware #automation #robotsoftware #offlineprogramming #olp #industry #robotprogramming #robotsimulation
Like Comment
To view or add a comment, sign in

10,525 followers

96 Posts

View Profile Follow

Gary Illyes’ Post

More Relevant Posts

Explore topics