Robots.txt file allows you control how you want search engine bots to crawl and index your website. You can block certain search engines or you ask some search engines not to index certain part of your website. In fact, robots.txt is the ultimate thing out there to control your website’s online presence and that means, you need to know what you are doing because a wrong move from your side and the visibility of your website can come under serious strain and even in some worse case, your website might get vanished from the index of major search engines. So, to help you out here we are going to give you some tips to manage the robots.txt file effectively and efficiently –
Upload it to Root Folder
Sometimes, people upload the robots.txt file to the sub folder. You need to make sure that the robots.txt file has been uploaded to the root folder of your website. For example, if the url of your website is – http://www.example.com, the location of the robots.txt file should be the following – http://www.example.com/robots.txt .
You should never upload it to any folder because in that case, the URL would look like this – http://www.example.com/folder/robots.txt and search engine will not follow the directive because of this.
Now, if your website is in sub domain for example – http://mywebsite.example.com, the URL of the robots.txt file should look like this - http://mywebsite.example.com/robots.txt .
Know the Directives
Most robots.txt file starts with the following tag - User-agent: *. This ‘*’ encompasses all the search bots of all search engines. However, if you wish to specify any specific search engine bot, you need to replace the ‘*’ with the name of the bot.
So, for example if you want to restrict the behaviors of google image bot, you need to write a tag like this - User-agent: Googlebot-Image. In case, you want to specifically target Google bot, you need to add a tag like this - User-agent: Googlebot.
Now, once you have selected the user agent, you need to add tags to either allow them or disallow them to crawl and index certain section of your website. For example, you want all search engine bots not to index the news section of your website, you need to add the following tags –
User-agent: *
Disallow: /news/*
These tags will effectively block all search engines from indexing the news section and all its URLs from getting shown up on search engine result pages of almost all search engines. Now if you wanted to restrict only Google bot from indexing the news section of your website in that case, you would require adding the following tags
User-agent: Googlebot
Disallow: /news/*
Now, if you want to block only a single page, you can do that too with robots.txt file of your website. Say for example, you want to block this page – http://www.example.com/block.html. In this case, you need to add the following tag –
Disallow: /block.html
Sometimes, it might happen that you need to block all html pages of your website. In that case, you need to add the following tag –
Disallow: /*.html$