About Get Elastic

Get Elastic is lovingly brought to you by Linda Bustos of Elastic Path Software, a flexible ecommerce framework for enterprises.

We also have a technical blog for Elastic Path users and partners.

Get New Posts Delivered to You
Next Webinar:
Ecommerce for Technology Vendors: Maximizing Your Online Channel

Too Many URLs Spoil the SEO: Fixing a Common Ecommerce Duplicate Content Problem

A common SEO problem for ecommerce sites is CMS (content management systems) that create different URLs for a product that lives under multiple categories. The main reason this is bad for SEO is search engines only allocate so much bandwidth to crawling your site. If most or all of your product pages have duplicates, you’re less likely to get your site fully crawled and indexed — meaning lost organic search opportunity.

The above shows 6 copies of the product page for Abercrombie’s Clarissa skirt are currently indexed in Google. Half of the links lead to a 404 page, the rest look like this:

http://www.abercrombie.ca/webapp/wcs/stores/servlet/ProductDisplay?storeId=11306&catalogId=10901&productId=482314&langId=-1&categoryId=12280&parentCategoryId=12203&colorSequence=01

http://www.abercrombie.ca/webapp/wcs/stores/servlet/ProductDisplay?langId=-1&storeId=11306&productId=482314&parentCategoryId=12203&catalogId=10901&categoryId=12280

http://www.abercrombie.ca/webapp/wcs/stores/servlet/product_11306_10901_482314_-1_12280_12203

The best practice is to use a global alias for the product page URL for the UI to look up and render instead of the category-specific URL. If you wish to maintain breadcrumb trails, use a session ID or cookie to track which categories the customer clicked to locate the product. If the visitor lands on the page without browsing through a category menu (search engine referral, affiliate link, PPC ad, email, site search etc), default to a parent category.

Keep in mind that session IDs can get crawled and indexed by search engines, creating even worse duplicate content issues (and security issues with some ecommerce platforms). To block search engines from crawling URLs with session IDs, use this syntax in your robots.txt file:

User-agent: *
Disallow: /*?

Other benefits of reducing duplicate content is it prevents Page Rank dilution and may simplify your web analytics (product page views and conversions aren’t spread over multiple URLs).

Like This Article?

Get New Posts Delivered to You

Comments

  1. March 23rd, 2009

    Having several url linking the same content is not a huge problem for PR :

    Quote from Google Central :

    “having multiple URLs on the same domain that point to the same content. Like http://www.example.com/skates.asp?color=black&brand=riedell and http://www.example.com/skates.asp?brand=riedell&color=black. Having this type of duplicate content on your site can potentially affect your site’s performance, but it doesn’t cause penalties.”

    See the article : http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html

  2. March 23rd, 2009

    Does the new canonical tag not solve this problem to a degree?
    http://www.webpronews.com/topnews/2009/02/13/canonical-tag-announced-googles-matt-cutts-interviewed

    The multiple pages would still exist, but SEO problems are solved. It’s not the perfect solution, but probably the easiest to implement.

  3. March 23rd, 2009

    Andrew beat me to it. The canonical tag serves as a mini 301-redirect and would be useful here. The robots.txt change you mention is still a best practice and is useful for screening out both session IDs and sorting URLs.

  4. Sun
    March 23rd, 2009

    Yes, I think canonical tag will help to some degree. But I’m so glad the major companies still have no clue when it comes to SEO. It gives the micro businesses who understand how to setup and organize their site a competitive advantage. I actually advocated a tag called “rmpar” much like in the pavuk crawler, but canonical tag results in similar results.

  5. March 23rd, 2009

    Yes, I covered the canonical tag last month, but again it’s not 100% – it’s a “hint” not a “directive”
    http://www.getelastic.com/canonical-url-tag/

    So this is a better solution for that, and also creates less pages for crawlers to spider – which means your site can get indexed more deeply.

  6. March 23rd, 2009

    Be careful with the robots.txt suggestion as it may block all content that has a ? in the url. You are better served to be specific with your session id extension in the robots.txt file.

  7. Steve Bolley
    March 23rd, 2009

    That’s a good marketing strategy, blog about other peoples sites and then hopefully they will want you to do the marketing for them. I noticed your trend here… ;)

  8. March 23rd, 2009

    We don’t offer marketing services. I use others’ sites as examples to illustrate my posts’ concepts.

  9. March 23rd, 2009

    @Jason thanks that’s a good point.

  10. March 24th, 2009

    Google and Yahoo are now planning support duplicate URLs is there is canonical tag … so you can indicate which is the correct URL without any penalty.

    Also, if the site is already indexed (which is often the case) using robots.txt is not suggested as it will in effect kill all the value u earned … where ever possible it is better to use 301 re-directs.

  11. March 27th, 2009

    That’s a great catch linda.

    I too have seen this problem in most of the ecommerce sites. Thanks for sharing few promising solution.

  12. April 1st, 2009

    You could also use a canonical tag and that will eliminate duplicate content. With the Canonical Tag you could change the url to appear as a vanity URL in Google.

    Most Analytics packages will allow you to change the dynamic parameter to a vanity url for reporting purposes. This will allow you to know which pages visitors are going to.

  13. April 1st, 2009

    Sometimes it’s worst; a completely wrong title is displayed in the Yahoo listing!!!!!!!!!!!!!

    sometimes for the same website Home page, 3 different titles are shown!!!

  14. April 1st, 2009

    Can some one describe about the global alias concept and how its better than canonical tags.

  15. April 2nd, 2009

    I mentioned it before in comments, the advantage over the canonical tag is, the canonical tag, though supported is taken as a “hint” not a “directive” by search engines

    http://www.getelastic.com/canonical-url-tag/

    Rather than creating duplicate content, you redirect your navigational clicks to the product page – no category parameters. This also creates less pages for crawlers to spider – which means your site can get indexed more deeply. Search engines still can crawl and index your duplicate pages, but the canonical tag tells them which one you prefer to display in results. Search engines have the option to override your “hint” – which means the search engine must keep all the other copies of the page in case it deems one more relevant than your canonical.

  16. April 2nd, 2009

    Good information. canonical tag can be used to tell the search engines about your preferred page. But be careful when using it as it may create an infinite loop. like in page1 you may say canonical=page2 and in page2 if you say canonical=page1.

  17. April 8th, 2009

    Alex, duplicate content in your site it is not a penalty issue, it is about not consistent Information Architecture.

    Andrew, In my opinion canonical tag is the last solution to implement so Lets get back to the origin:

    Choose the categorization that brings you more relevant traffic, then articulate the rest of the secondary categories (navigational options) but when linking to a product use the URL produced at the main category structure.

    Information Architecture will improve Usability but also will help you to let the search engines indexing systems understand your content the way you want not as a bunch of unorganised words

  18. April 24th, 2009

    Very good post. this common mistake can be easily overlooked and cery costly for some businesses. Thanks for the info, and great looking blog btw.

    -Justin

  19. July 16th, 2009

    serius proble, deserve a second thought about this.

Leave a comment

Sites linking to this article

  1. Bloggers Digest Resurrected: Ecommerce Links for September 2009 | Get Elastic on September 30, 2009