Too Many URLs Spoil the SEO: Fixing a Common Ecommerce Duplicate Content Problem

A common SEO problem for ecommerce sites is CMS (content management systems) that create different URLs for a product that lives under multiple categories. The main reason this is bad for SEO is search engines only allocate so much bandwidth to crawling your site. If most or all of your product pages have duplicates, you’re less likely to get your site fully crawled and indexed — meaning lost organic search opportunity.

The above shows 6 copies of the product page for Abercrombie’s Clarissa skirt are currently indexed in Google. Half of the links lead to a 404 page, the rest look like this:

http://www.abercrombie.ca/webapp/wcs/stores/servlet/ProductDisplay?storeId=11306&catalogId=10901&productId=482314&langId=-1&categoryId=12280&parentCategoryId=12203&colorSequence=01

http://www.abercrombie.ca/webapp/wcs/stores/servlet/ProductDisplay?langId=-1&storeId=11306&productId=482314&parentCategoryId=12203&catalogId=10901&categoryId=12280

http://www.abercrombie.ca/webapp/wcs/stores/servlet/product_11306_10901_482314_-1_12280_12203

The best practice is to use a global alias for the product page URL for the UI to look up and render instead of the category-specific URL. If you wish to maintain breadcrumb trails, use a session ID or cookie to track which categories the customer clicked to locate the product. If the visitor lands on the page without browsing through a category menu (search engine referral, affiliate link, PPC ad, email, site search etc), default to a parent category.

Keep in mind that session IDs can get crawled and indexed by search engines, creating even worse duplicate content issues (and security issues with some ecommerce platforms). To block search engines from crawling URLs with session IDs, use this syntax in your robots.txt file:

User-agent: *
Disallow: /*?

Other benefits of reducing duplicate content is it prevents Page Rank dilution and may simplify your web analytics (product page views and conversions aren’t spread over multiple URLs).


Related Articles

21 Responses to “Too Many URLs Spoil the SEO: Fixing a Common Ecommerce Duplicate Content Problem”

  1. Alex says:

    Having several url linking the same content is not a huge problem for PR :

    Quote from Google Central :

    “having multiple URLs on the same domain that point to the same content. Like http://www.example.com/skates.asp?color=black&brand=riedell and http://www.example.com/skates.asp?brand=riedell&color=black. Having this type of duplicate content on your site can potentially affect your site’s performance, but it doesn’t cause penalties.”

    See the article : http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html

  2. Does the new canonical tag not solve this problem to a degree?
    http://www.webpronews.com/topnews/2009/02/13/canonical-tag-announced-googles-matt-cutts-interviewed

    The multiple pages would still exist, but SEO problems are solved. It’s not the perfect solution, but probably the easiest to implement.

  3. AJ Kohn says:

    Andrew beat me to it. The canonical tag serves as a mini 301-redirect and would be useful here. The robots.txt change you mention is still a best practice and is useful for screening out both session IDs and sorting URLs.

  4. Sun says:

    Yes, I think canonical tag will help to some degree. But I’m so glad the major companies still have no clue when it comes to SEO. It gives the micro businesses who understand how to setup and organize their site a competitive advantage. I actually advocated a tag called “rmpar” much like in the pavuk crawler, but canonical tag results in similar results.

  5. Yes, I covered the canonical tag last month, but again it’s not 100% – it’s a “hint” not a “directive”
    http://www.getelastic.com/canonical-url-tag/

    So this is a better solution for that, and also creates less pages for crawlers to spider – which means your site can get indexed more deeply.

  6. Be careful with the robots.txt suggestion as it may block all content that has a ? in the url. You are better served to be specific with your session id extension in the robots.txt file.

  7. Steve Bolley says:

    That’s a good marketing strategy, blog about other peoples sites and then hopefully they will want you to do the marketing for them. I noticed your trend here… ;)

  8. @Jason thanks that’s a good point.

  9. Google and Yahoo are now planning support duplicate URLs is there is canonical tag … so you can indicate which is the correct URL without any penalty.

    Also, if the site is already indexed (which is often the case) using robots.txt is not suggested as it will in effect kill all the value u earned … where ever possible it is better to use 301 re-directs.

  10. Idris says:

    That’s a great catch linda.

    I too have seen this problem in most of the ecommerce sites. Thanks for sharing few promising solution.

  11. You could also use a canonical tag and that will eliminate duplicate content. With the Canonical Tag you could change the url to appear as a vanity URL in Google.

    Most Analytics packages will allow you to change the dynamic parameter to a vanity url for reporting purposes. This will allow you to know which pages visitors are going to.

  12. Matt says:

    Sometimes it’s worst; a completely wrong title is displayed in the Yahoo listing!!!!!!!!!!!!!

    sometimes for the same website Home page, 3 different titles are shown!!!

  13. Brendon says:

    Can some one describe about the global alias concept and how its better than canonical tags.

  14. I mentioned it before in comments, the advantage over the canonical tag is, the canonical tag, though supported is taken as a “hint” not a “directive” by search engines

    http://www.getelastic.com/canonical-url-tag/

    Rather than creating duplicate content, you redirect your navigational clicks to the product page – no category parameters. This also creates less pages for crawlers to spider – which means your site can get indexed more deeply. Search engines still can crawl and index your duplicate pages, but the canonical tag tells them which one you prefer to display in results. Search engines have the option to override your “hint” – which means the search engine must keep all the other copies of the page in case it deems one more relevant than your canonical.

  15. Good information. canonical tag can be used to tell the search engines about your preferred page. But be careful when using it as it may create an infinite loop. like in page1 you may say canonical=page2 and in page2 if you say canonical=page1.

  16. Alex, duplicate content in your site it is not a penalty issue, it is about not consistent Information Architecture.

    Andrew, In my opinion canonical tag is the last solution to implement so Lets get back to the origin:

    Choose the categorization that brings you more relevant traffic, then articulate the rest of the secondary categories (navigational options) but when linking to a product use the URL produced at the main category structure.

    Information Architecture will improve Usability but also will help you to let the search engines indexing systems understand your content the way you want not as a bunch of unorganised words

  17. Genewize says:

    Very good post. this common mistake can be easily overlooked and cery costly for some businesses. Thanks for the info, and great looking blog btw.

    -Justin

  18. serius proble, deserve a second thought about this.

  19. [...] optimization: Google now lets you strip parameters from URLs. You now have a better weapon against duplicate content issues than the canonical URL [...]

  20. [...] can use your robots.txt file to block Google from crawling URLs with session IDs. Or, you can use canonical tags. You also can use rewrites and redirects. For example, if you have [...]

Leave a Reply

© 2014 Get Elastic Ecommerce Blog. All rights reserved. Site Admin · Entries RSS · Comments RSS