- Posted on
- • Past Newsletter Articles
What Is Robots.txt?
- Author
-
-
- User
- AmeriWeb Hosting
- Posts by this author
- Posts by this author
-
The robots.txt is a simple text file that sits in the root directory of your site and tells crawlers what should be crawled.
The table below provides a quick reference to the key robots.txt directives. User-agent = Specifies which crawler the rules apply to. See user agent tokens. Using * targets all crawlers. Disallow - Prevents specified URLs from being crawled. Allow - Allows specific URLs to be crawled, even if a parent directory is disallowed. Sitemap - Indicates the location of your XML Sitemap by helping search engines to discover it.
User-agent: * Disallow: /downloads/ Allow: /downloads/free/ Disallow: /*.pdf$ Sitemap: https://www.example.com/sitemap
By using /*, the rule matches any path on the website. As a result, any URL ending with .pdf will be blocked from crawling.
Note that you must always specify relative paths and never absolute URLs, like “https://www.example.com/form/” for Disallow and Allow directives.
Except: The Sitemap directive requires a full, absolute URL to indicate the location of the sitemap.
Be cautious to avoid malformed rules. For example, using /form without a trailing slash will also match a page /form-design-examples/, which may be a page on your blog that you want to index.
Let’s dive into examples of how you can use robots.txt for each case.
User-agent: * Used to specify which search engine these rules apply to. * translates to "all". Disallow: /downloads/ Do not crawl the downloads directory Allow: /downloads/free/ EXCEPTION: you can crawl the downloads/free/ directory, in spite of the above rule Disallow: /*.pdf$ Do not crawl any pdf file Sitemap: https://www.example.com/sitemap Gives the location of your sitemap. This must be a full URL. You can offer multiple sitemaps, each must have its own line.
We hope this helps! As always, if you have any questions, feel free to contact us.
Contact Us: AmeriWeb Hosting 13930 S BRISTLECONE LN D, PLAINFIELD, IL 60544 (773) 735-5144 https://ameriwebhosting.com/
FREE Marketing eNewsletter Get future copies free! SUBSCRIBE FREE