jump to navigation

Canonical Issues & Duplicate Content November 12, 2008

Posted by Sarah Bernier in Things.
Tags: , ,
add a comment

In reviewing/auditing websites for my current employer, I continue to run across a very common issue: duplicate content due to canonical issues. Now, I’m sure any SEO knows that – over time – search engines will naturally canonicalize URLs (pick what they think is the preferred URL). But in the meantime, if you’re acquiring any 3rd-party links, odds are they’re spread out amongst the various ‘versions’ of the URLs.

The issue here isn’t dupe content, since you don’t incur “penalties,” per se, for having duplicate content. Rather, as aforementioned, over time the search engines just pick what they see is the original source of the content and display only that version.

For the sake of argument, let’s review some possible versions of a URL. Keep in mind that in spite of the rendering the same content, search engines still see these as separate URLs:

In instances where the site is set up incorrectly – and the preferred domain isn’t denoted – sites can have multiple versions (upwards of 4) of their content. The longer the site is online, the greater the chances the “link love” will be spread out between different URLs. This is especially true since in the majority of sites where I see this happening, the internal links to the home page of the site don’t point to the root level domain (www.mysite.com), but instead some other version (most often some variation of http://www.mysite.com/index.html).

Great, so we’ve diagnosed this…now what?

The first thing to do is realize which URL is the canonical version. Most webmasters use http://www.mysite.com, but there could be reasons why they’d choose mysite.com. Whichever version is chosen, you must be sure to remain consistent throughout the site.

I’m not a technical person, so I’m referring to “code nerds” on this one, but I do know you need to 301 redirect the non-www to www version. Every content management system (CMS) should have a relatively easy way to do this. For those of you not using a CMS, read about avoiding duplicate content by using .htaccess files in more techy detail.

Keep in mind that these 301s need to be implemented at the page level. Recently I gave an audit to a company with an in-house tech team and they didn’t quite understand how to go about fixing the problem and in fact made it worse by redirecting all non-www versions of URLs across the site back to the root level domain! Did I mention they did this with 302 redirects? Ooops.

Long story short: Make sure you don’t have more than one “site” floating about online, and when you try to fix something that’s broken, make sure you completely understand the inner workings of a website first 🙂