Robots.txt file is very important!!
A robots.txt file basically gives any visiting search engine spider express permission to crawl every page on the site. Should specific areas of the site be protected from crawling, the folder or file location should appear after the ‘Disallow:’ instruction.
For example:
1. To exclude all robots from a server:
User-agent: *
Disallow: /
2. To allow all robots complete access:
User-agent: *
Disallow:
3. To exclude all robots from parts of a server:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow:/private/
4. To exclude a single robot:
User-agent: BadBot
Disallow: /
It now has the added benefit specifying the location of an XML sitemap which is a must if you want Google and the other search engines every opportunity to crawl and index more pages of your website.
|