It is important to create and optimize the robots.txt to make your Magento store secure and improve SEO.
The robots.txt ("robots dot text") is a text file that help Search engine robots (such as Google bot and Bing bot) to determine which information to index. By default there is no robots.txt in Magento Community or Enterprise distributive so you should create it yourself.
How robots.txt will improve your Magento?
This is just a few use-cases of robots.txt usage, so you will get a better idea why it is so important:
- The robots.txt will help you to prevent duplicate content issues (it is very important for SEO).
- It will hide technical information such as Errors logs, Reports, Core files, .SVN files etc from unexpected indexing (hackers will not be able to use Search engines to detect your platform and other information).
Robots.txt installation
Note: The robots.txt file covers one domain. For Magento websites with multiple domains or sub-domains, each domain/sub-domain (e.g. store.example.com and example.com) must have its own robots.txt file.
Magento Community and Magento Enterprise
Installation of robots.txt is easy. All you need is to create robots.txt file and copy the robots.txt code from our blog. Next, upload the robots.txt to the web root of your server, for example here: example.com/robots.txt.
If you will upload the robots.txt to sub-folder, e.g. example.com/store/robots.txt in this case robots.txt will be ignored by all search engines.
Magento Go
Installation of robots.txt for Magento Go is described in this Knowledge Base article.
Robots.txt for Magento
Here our recommended robots.txt code, please read the comments marked by # before robots.txt publishing:
## robots.txt for Magento Community and Enterprise ## GENERAL SETTINGS ## Enable robots.txt rules for all crawlers User-agent: * ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server. ## Set a custom crawl rate if you're experiencing traffic problems with your server. # Crawl-delay: 30 ## Magento sitemap: uncomment and replace the URL to your Magento sitemap file # Sitemap: http://www.example.com/sitemap/sitemap.xml ## DEVELOPMENT RELATED SETTINGS ## Do not crawl development files and folders: CVS, svn directories and dump files Disallow: /CVS Disallow: /*.svn$ Disallow: /*.idea$ Disallow: /*.sql$ Disallow: /*.tgz$ ## GENERAL MAGENTO SETTINGS ## Do not crawl Magento admin page Disallow: /admin/ ## Do not crawl common Magento technical folders Disallow: /app/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /lib/ Disallow: /pkginfo/ Disallow: /shell/ Disallow: /var/ ## Do not crawl common Magento files Disallow: /api.php Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /get.php Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /README.txt Disallow: /RELEASE_NOTES.txt ## MAGENTO SEO IMPROVEMENTS ## Do not crawl sub category pages that are sorted or filtered. Disallow: /*?dir* Disallow: /*?dir=desc Disallow: /*?dir=asc Disallow: /*?limit=all Disallow: /*?mode* ## Do not crawl 2-nd home page copy (example.com/index.php/). Uncomment it only if you activated Magento SEO URLs. ## Disallow: /index.php/ ## Do not crawl links with session IDs Disallow: /*?SID= ## Do not crawl checkout and user account pages Disallow: /checkout/ Disallow: /onestepcheckout/ Disallow: /customer/ Disallow: /customer/account/ Disallow: /customer/account/login/ ## Do not crawl seach pages and not-SEO optimized catalog links Disallow: /catalogsearch/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ ## SERVER SETTINGS ## Do not crawl common server technical folders and files Disallow: /cgi-bin/ Disallow: /cleanup.php Disallow: /apc.php Disallow: /memcache.php Disallow: /phpinfo.php ## IMAGE CRAWLERS SETTINGS ## Extra: Uncomment if you do not wish Google and Bing to index your images # User-agent: Googlebot-Image # Disallow: / # User-agent: msnbot-media # Disallow: /
Test your robots.txt
After robots.txt publication your can check its syntax using these on-line tools:
Further reading
www.webiste.com/category/product.html?dir=desc&limit=10&manufacturer=29&order=positio or
www.webiste.com/category/product.html?limit=10&manufacturer=29&mode=grid&p5
I ask you: there is a way to block it in the robots.txt and how? Is it useful to prevent indexing a lot of page ordered by manufacturer and other? Thanks for your time.
Should i have to add another lines for resolving these errors
Disallow: /ajax/
Disallow: /ajaxsuite/
Disallow: /ajaxwishlist/
Disallow: /priceslider/
> I got a bug on my site which made all the canonical URL
It is related with some 3-rd party extension?
It is not necessary to specify user agent in each line.
Look like you use rather old validate tool. What exactly you use?
You can validate your robots.txt using this nice tool: http://webmaster.yandex.com/robots.xml