Thread: robots.txt file
View Single Post
  #4 (permalink)  
Old 08-08-2007, 05:50 AM
WebBoy WebBoy is offline
Junior Member
 
Join Date: Aug 2007
Location: Sydney, Australia
Posts: 4
WebBoy is on a distinguished road
Default Robots.txt file is very important!!

A robots.txt file basically gives any visiting search engine spider express permission to crawl every page on the site. Should specific areas of the site be protected from crawling, the folder or file location should appear after the ‘Disallow:’ instruction.
For example:

1. To exclude all robots from a server:
User-agent: *
Disallow: /

2. To allow all robots complete access:
User-agent: *
Disallow:

3. To exclude all robots from parts of a server:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow:/private/

4. To exclude a single robot:
User-agent: BadBot
Disallow: /

It now has the added benefit specifying the location of an XML sitemap which is a must if you want Google and the other search engines every opportunity to crawl and index more pages of your website.
Reply With Quote