The Robots.txt is a text file that website owners create to instruct search engine robots to crawl pages on their websites. The robots.txt file is part of the Robot Exclusion Protocol (REP). REP is a group of web standards that operate how robots crawl the web, access and index content, and serve that content to users.
Practically, robots.txt files determine whether certain users/ software can or cannot crawl parts of a website. The instructions are identified by “disallowing” or “allowing” the behavior of user agents.
Syntax:
User-agent: [user-agent name]Disallow: [URL string cannot be crawled]
The robots.txt file is publicly available. To see the robots.txt file of any website, just add “/robots.txt” to the end of any root domain to see that website’s directives. Anyone can see what pages you do or don’t want to be visited, so don’t hide private user information in the robots.txt file. Every subdomain on a root domain uses separate robots.txt files. The robots.txt is case-sensitive, which means the file must be named exactly "robots.txt" and not "Robots.txt", "robots.TXT", or otherwise. A robots.txt file must be placed in a website’s top-level directory to be found easily by users.
Robots.txt files control visiting access to certain areas of your site. Modifying robots.txt files incorrectly can be dangerous, especially if you accidentally block Googlebot from accessing your entire site, there are some situations in which a robots.txt file can be very useful.
Some common use cases include:
Search engines have two main jobs:
After arriving at a website, the search engine looks for a robots.txt file. If it finds one, it will read that file first before continuing through the page. Because the robots.txt file contains insightful information about how the search engine should analyze the website, the information found there will instruct further action of the visitor on this particular site. If the site does not have a robots.txt file, it will proceed to search for other information on the site.