robots.txt is a bit of text information located at the main directory of a website. The robots are used to control search engine bots and it also offers webmasters a great way to tell search engines while files or pages should be crawled (i.e. visited) by search engines, and which files and pages should not be crawled.
The robots.txt file can be created using a simple text editor and then ensuring it is named "robots.txt". Following are some lines in the file that are always crawled and analyzed by search engines. Each entry in the robots.txt has two parts:
The first part is called the "User-Agent." This part will list a certain bot, e.g. Google Bot.
You should start a line with "User-agent:*" which will tell all crawling bots to follow the following lines.
You can define what is allowed to be crawled, and what isn't allowed, in the second part with allow and disallow. This will tell a bot if it is allowed to crawl a page or not. You could start this line with "Disallow: /", which means that any listed bots may not crawl any files or pages.
A simple robots.txt file contains 2 lines and allows all bots to crawl and read all files and pages of your site.
# Full access to your site:
User-agent: *
Disallow:
The next example shows the content of a robots.txt file that doesn't allow any page on your site to be indexed by search engines, and therefore not show up in search engines:
# Website closed to search engines:
User-agent: *
Disallow: /
Access to certain files or pages can be denied by having the following lines in your robots.txt:
User-agent: *
Disallow: /news/
Disallow: /daily.html
In order to deny access to your website for only certain search engine bots, you must address each bot in the User-Agent part of an entry:
User-agent: Googlebot
Disallow: /
You can also use Allow to explicitly allow a crawler to read one page or file:
User-agent: Googlebot
Disallow: /folder1/
Allow: /folder1/exampledata.html
The robots.txt file can be used to prevent any page from being indexed so that you never see one of your pages in a search engine that you don't want to be found there. For example, you might want to keep unnecessary picture galleries from showing up in Google's search results.
You can additionally state where the sitemap.xml is on your site in the robots.txt. This information can help search engines find even more pages and content on your website.
User-Agent: *
Disallow:
Sitemap: http://www.yourwebsite.com/sitemap.xml
The same applies for video or picture sitemaps
User-Agent: *
Disallow:
Sitemap: http://www.yourwebsite.com/sitemap.xml
Sitemap: http://www.yourwebsite.com/video-sitemap.xml
Sitemap: http://www.yourwebsite.com/picture-sitemap.xml
Learn everything you need to know to take your marketing to the next level. Get your business on track for success!
Free Download PDFX