The “Big 3” search engines (Google, Yahoo and MSN) broke big news last week with the announcement they will support a Canonical URL tag to help webmasters (and search engines) better manage duplicate content issues. Duplicate content refers to identical or near-identical blocks of text on more than one page in a search engine’s index.
Examples of duplicate content on ecommerce sites:
- A product is listed under multiple categories, each with its own URL
- The search engine crawls a site and is issued a session ID. It indexes links with the session ID
- A blogger copies a product link with a session ID or navigation tracking parameter like
http://www.site.com/B00DK/ref=acc_glance_sw_ai_549_1_img and unwittingly pastes the link as-is in a blog post
- An affiliate link like http://www.amazon.com/Logitech-Cordless-Laser-Mouse/?affid=1234 gets crawled and indexed
- Content is duplicated across sub-domains or sub-folders like canada.yoursite.com or yoursite.com/uk/
- The search engine crawls your print friendly version
Duplicate content problems include:
- When multiple copies exist, search engines want to choose one version to show in search results and filter the rest. They may not show the version you want (print friendly or worse, an affiliate URL so you’re paying commissions on organic search conversions!)
- Your SEO suffers because PageRank is diluted across several copies of a page (what is PageRank?)
- Your site might not get fully crawled by the search engine as search engines will only give you so much attention in a given session
Duplicate content can also occur across domains, like multi-stores with country-specific domain extensions like yoursite.co.uk or if many retailers are using stock manufacturer descriptions. The Canonical URL Tag does not remedy this situation.
Up until February, 2009, webmasters dealt with duplicate content by “sculpting PageRank” with rel=”nofollow” attributes, rel=”noindex” or using 301 permanent redirects. Now you can specify which is the original version of your content with the tag and rel=”canonical” attribute in a page’s head section, like:
<link rel="canonical" href="http://estore.com/womens/sweaters/esprit/B3H4H5"/>
Which *should* nudge search engines in the right direction when choosing which URL to display.
I say *should* because search engines consider this a hint rather than a directive – much like my hairdresser, you can give your suggestion but it’s going to do whatever the heck it wants. Nevertheless, I believe this will help a lot of online retailers’ SEO efforts and reduce the headache of duplicate content.
Bonus if you got the “canon” and “shot” pun in the title, yes I know it’s kinda lame.