You can control how search engines "crawl" your website by using the right commands on Robots.txt. For learning more about commands, let's first take a look at what the general syntax's are.
Robots.txt syntax and commands.
1. User-agent: This command on the Robots.txt file specifies the general specification for the search engine robot.
For example:
User-agent: Google (Means robots/crawlers from Google)
User-agent: Ask (Means robots/crawlers from Ask.com)
On the Robots.txt file, you can specify each user agent specifically, or invoke/address them generally by using the asterisk command.
User-agent: * (Means all search engine robots/crawlers)
2. Allow/Disallow: This command specifies the condition where it instructs a user agent to crawl/not crawl certain parts/all parts (as specified with the command) of the website. You can specify the directories within the website to be crawled/not crawled using the command.
For example:
User-agent: *
Disallow: / (Means all the robots are not allowed to crawl everything that comes under the root folder, which is the entire website)
User-agent: *
Disallow: /temp/ (Means all the robots are not allowed to crawl the folder named "temp", while other parts are allowed to crawl) How to set up a Robots.txt file? Setting up a Robots.txt file can be tricky if you don't know the basic commands, so make sure you have studied the basics well before proceeding to set up a Robots.txt file.
Step 1 Open a new text document on your machine.
Step 2 In it, type these text, accurately. User-agent: *
Disallow:
(This means that all user agents are allowed to crawl your entire website.)
Save it as "Robots.txt"
Step 3 Go to your server by accessing the file manager or the FTP, and go to the root folder. ( normally _public-html or http-docs or find out your's from your host.)
Step 4 Upload the "Robots.txt" file to the root folder.
Your Robots.txt file is now set up successfully. Note that we have given the command to allow allow all search engine robots to crawl the entire site without any restriction. If you would like to selectively disallow/block certain files/folders to be crawled, follow the commands below.
1. Exclude a file from an individual search engine. User-agent: Google
Disallow: /thepathtoyourfile.html
Replace "Google" with your search engine preference and replace "thepathtoyourfile.html" with the actual path to your file. If you would like to block more than one file, you have to repeat this command (second line) with specific file names.
Ex: Disallow: /file1.html
Disallow: /file2.html
2. Exclude a section of your site from all spiders and bots User-agent: *
Disallow: /1/2/dir-to-be-blocked/
Replace "dir-to-be-blocked" with the actual path to your directory that is to be blocked.
3. Allow all spiders to index everything User-agent: *
Disallow:
OR
Leave the Robots.txt blank without any commands.
4. Allow no spiders to index any part of your site User-agent: *
Disallow: /
This ensures that no spider would index anything at all on your site. Free Robots.txt file generators There are quite a number of free online Robots.txt file generators. Here is a list of few.
1.
Mcanerin This tool lets you select search engine robots selectively that you'd like to block and create a Robots.txt file which you just need to copy paste.

2.
Global promoter Robots.txt generator - Excellent tool that helps you generate a Robots.txt file with the help of a wizard.

Summary Essentially, Robots.txt is an excellent tool to control how search engines scan your website, and gather up information from them. The more complex and careful you plan your
website design, the better your search engine positions would be. But many websites simply ignore this and leave everything to the search engines to decide. Is that a good thing to do? I'd say it all depends on how you want it, if you think "not showing" a folders content to Google will avoid un necessary information being passed to it, and you know exactly how you can accomplish it, then why not use the options?