The ugly problem of duplicate content part 2
by Alan Mosley
|Wednesday, November 16, 2011|
A continuation from The ugly problem of duplicate content part 1.
More serious cases of duplicate content
I will now concentrate more on the darker side of duplicate content, in part 1 I listed some causes of duplicate content and we covered the innocent causes of duplicate content…
In in this post we will cover…
Here we have duplicate content on someone else's website, either you have copied their work, or they have copied yours often done is mass using screen scraping software. Only one version will get credit for this content and it may not be you, even if you created it.
Fixes and penalties
To protect against screen scrapers you can put a canonical tag in your content pointing to your version and hope that if it is scraped they do not take it out, leaving you getting the credit, but this is only a weak defence. It is easy to find and take out any canonical tags, if someone has the ability to make a screen scraper it is safe to assume they would also have the ability remove tags also. You can use the rel author attribute on your content giving a signal to search engines who the real owner is, but this is a bit iffy also. The best defence from screen scrapers is themselves, a search engine can see that a site may have many pages that are duplicated form many sites, leading the search engines to believe that the site is the result of screen scaping, they are unlikely to give that site credit.
If you are the one that has plagiarized or screen scaped, you need to wonder about how search engines see your site, they may well see that with many cases of duplicate content that it is you that is stealing the work from others, and you are unlikely to get any credit, and you risk even more penalties.
Search engines don't like screen scrapers and when they find them they will take away any ranking they have. The panda update is looking for just this sort of thing.
You can get done for duplicate content when you are innocent of the fact. Most websites are built with a master page or template that holds the header, footer and side menus, these areas are by nature duplicate content. Search engines are not worried about these areas they are worried about the main content area. Often when you have very little content in the main content area, it is possible for search engines to decide your page is duplicate content. Matt Cutts has stated many times that they will take a page and takeaway of the duplicate content, all the advertising and it if the remaining content is not useful, does not add any value or in thin then it will be punished.
Fixes and penalties
For thin content the answer is simple, you need to add more content to the page to make sure your page is detectable as unique.
Affiliate duplicate content
By affiliate I mean those sites selling products on behalf of another vender or organization under some sort of licence or agreement, an example is the Amazon affiliate program. The problem with this is that the content provided to you is the same content provided to every other affiliate causing a duplicate content problem.
Fixes and penalties
A site with nothing but duplicate content provided from a vender will be seen as thin content, and normal duplicate content penalties will apply.
The way around this is to write your own product descriptions and to add your own content to the page. Affiliate websites were some of the hardest hit by the Panda update.
301 redirects, canonical tags and blocking of duplicate pages in robots.txt file or no-index tags are less than perfect solutions, all have problems such as link juice leaks. The best solutions is not have duplicate content in the first places or to add enough original content to the page to eliminate the problem.
Some thing to consider when published a page, take away all the advertising, all the duplicate content that can be found on another page somewhere else on the internet or your own website and consider what is left. Does it add value or is it useful to the user? If you answer no, then you have a problem. Did you create the vast majority of the content on the page? If you answer no, then you have a problem.