The ugly problem of duplicate content part 1

by

Do you have a duplicate content problem?

Duplicate content is becoming a bigger and bigger problem, search engines have tighten up duplicate content. Google has been addressing it with the so named Panda updates, and they have not finished yet. There are more integrations of the panda update to come. Many websites have suffered loses of rankings over the last year from these updates for a variety of reasons, duplicate content has been a big factor in many of these cases.

Until recently Google had stated that they rely mainly on filtering and not ranking to solve duplicate content, but that has changed, actions taken now range from nothing, filtering to loss of ranking.

The many types of duplicate content

There are many causes of duplicate content and many ways to avoid or fix them.

  • Intentional duplicate content
  • Canonical duplicate content
  • Plagiarism
  • Thin content
  • Affiliate duplicate content

Intentional duplicate content

There are occasions when you intentionally want duplicate content. You may have a policy that has to be displayed on several pages, because of linking structure it may not be practical to link to the original version each time, you may have several sites with the same policy. You may have several sites for different countries where the whole site is a duplicate.

Fixes and penalties

Geo TLDs

If you have duplicate content on different TLDs, examples

  • domain.com
  • domian.com.au
  • domain.co.uk

This is not a problem, Search engines will most of the time see this as ok, you should try to use the correct regional formats such as dates and currencies, and should be hosted in the correct country. If you cannot host in the correct country you should at least use geo coding, but they must have the correct TLD.

Other types of intentional duplicate content

There is no way around these types of duplicate content, only one version of the content will rank in search engines, other versions will be filtered. The best you can do is let the search engine know what version you want to rank using a canonical tag.

The correct use of a canonical tag is to put the tag in the versions of the content you do not want to rank and point the href attribute to the version you do want to rank.

Example: <link rel="canonical" href="http://thatsit.com.au/"/>

Canonical duplicate content

If you have two or more URLs on your website pointing to the same page, all these URLs listed below can point to the same page.

  • www.domain.com/index.html
  • domain.com/index.html
  • domain.com/Index.html
  • domain.com/index.html/
  • domain.com/index.html?id=1
  • domain.com/index.html?id=2

I could probably go on,

Fixes and penalties

The best thing to do here is make sure that you use only one version of the URLs here to access your page within your site, all internal links should use the one format, this will stop any problems on your own site and since most external links are copied from your site or the address bar, you will limit any external canonical issues.

The next thing you should do is do a 301 redirect to ensure that all URLs will 301 redirect to the correct version. See these SEO tutorials on how to do this.

It is also possible to put a canonical tag in the page so that all URLs to will be filtered to the one version you select. This is not the preferred way to do this, as Bing suggests that a page should not have a canonical tag pointed to itself, and since in cases of canonical duplicate content we are talking of only one page with multiple URLs the page would point at itself.

Failing to to fix canonical URLs will result in two actions, you will be filtered and only one version will rank. any links you have using two or more versions of these links will be seen as links to two or more pages and your rank will be divided.

See, The ugly problem of duplicate content part 2 for more on duplicate content.

References

Managing duplicate content and redirects
Partnering to help solve duplicate content issues
Deftly dealing with duplicate content
Demystifying the duplicate content penalty

Back to SEO Articles