AmeriWebHosting
 

One stop effective online business solutions.
AmeriWeb Hosting, Marketing, Designs, eCommerce.
Custom Site Creation, Graphics, Copywriting,
Do It Yourself SiteBuilder
Monthly or Annual Hosting Plans and Domain Names

Call Us NOW at (773) 735-5144

Chicago Website Design
Chicago Hosting
Sitebuilder

Knowledgebase

Please Login or Register

How to use robots.txt

What is the purpose of the robots file?

When a search engine crawls (visits) your website, the first thing it looks for is your robots.txt file. This file tells search engines what they should and should not index (save and make available as search results to the public). It also may indicate the location of your XML sitemap. The search engine then sends its "bot" or "robot" or "spider" to crawl your site as directed in the robots.txt file (or not send it, if you said they could not).

Google's bot is called Googlebot, and Microsoft Bing's bot is called Bingbot. Many other search engines, like Excite, Lycos, Alexa and Ask Jeeves also have their own bots. Most bots are from search engines, although sometimes other sites send out bots for various reasons. For example, some sites may ask you to put code on your website to verify you own that website, and then they send a bot to see if you put the code on your site.

Where does robots.txt go?

The robots.txt file belongs in your document root folder.

You can simply create a blank file and name it robots.txt. This will reduce site errors and allow all search engines to rank anything they want.

Blocking Robots and Search Engines from Crawling

If you want to stop bots from visiting you site and stop search engines from ranking you, use this code:

#Code to not allow any search engines!
User-agent: *
Disallow: /

You can also prevent robots from crawling parts of your site, while allowing them to crawl other sections. The following example would request search engines and robots not to crawl the cgi-bin folder, the tmp folder, and the junk folder and everything in those folders on your website.

# Blocks robots from specific folders / directories
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

In the above example, http://www.yoursitesdomain.com/junk/index.html would be one of the URLs blocked, but http://www.yoursitesdomain.com/index.html and http://www.yoursitesdomain.com/someotherfolder/ would be crawlable.

Keep in mind that robot.txt works like a "No Trespassing" sign. It tells robots whether you want them to crawl your site or not. It does not actually block access. Honorable and legitimate bots will honor your directive on whether they can visit or not. Rogue bots may simply ignore robots.txt.



Was this answer helpful?

Add to Favourites Add to Favourites

Print this Article Print this Article


Powered by WHMCompleteSolution

  Quick Navigation
 
  Portal Home Portal Home
Client Area Client Area
Announcements Announcements
Knowledgebase Knowledgebase
Submit Ticket Submit Ticket
Downloads Downloads
Order Order
WebMail WebMail
Site control Panel Site Control Panel
  Client Login
 

Email

Password

Remember Me

  Search
 


AmeriWebHosting
© Ameriweb Hosting Chicago Illinois