Blog Advertising - Get Paid to Blog

Thursday, May 1, 2008

Use the robots.txt file to control access to your website

To control how and when your website is crawled, create a robots.txt file in the top-level (root) directory of your website. In the robots.txt file, you can specify which web crawlers to allow or block. Note that while MSNBot complies with the standards for robots.txt, not all web crawlers comply.

To conform to the Robots Exclusion Standard, MSNBot searches for robots.txt. When you create the file, make sure that the file is named robots.txt. Crawling and indexing restrictions may not work correctly if you name the file robot.txt.

Each time MSNBot crawls your website, it looks in your web server's root directory for a robots.txt file. If the file exists, MSNBot checks to see if MSNBot is an allowed user agent, and if any crawling or indexing restrictions have been set.

To set which web crawlers can access your website, use the syntax in the table below for your robots.txt file. MSN Search also includes image searching provided by Picsearch. If you do not want your images indexed, you can block the Picsearch crawler, Psbot, as described in the following table.

Text strings in the robots.txt file are not case-sensitive.

To do this:Use this syntax:
Allow all robots full access and to prevent "file not found: robots.txt" errorsCreate an empty robots.txt file
Allow all robots complete access
User-agent: *

Disallow:
Allow only MSNBot access
User-agent: msnbot
Disallow:
User-agent: *
Disallow: /
Exclude all robots from the entire server
User-agent: *
Disallow: /
Exclude only MSNBot
User-agent: msnbot
Disallow: /
Exclude only Psbot (Picsearch)
User-agent: psbot
Disallow: /

No comments:

 
ss_blog_claim=636b3226b9752525e0b4d3dc79b20607 6 27