Duplicate content for dummies
Posted by Dave on 27 Jan 2010 | Tagged as: Common SEO Topics
I recently referred a colleague to an article written about duplicate content and why it is not good practice to regurgitate old content onto your website. My colleague read the article but struggled to make sense of it all as the article was just too technical for someone who is just getting started in SEO. So I had to go through the article with him and break it down into a more a more understandable format. This blog entry is a dummy version of what we discussed.
What is Duplicate Content?
In internet terms, duplicate content refers to blocks, pages or snippets of text that is identical or extremely similar to content on another page or website. Often this is an innocent mistake made by one of the parties involved, however webmasters do sometimes deliberately copy content from other sites in the hope of either getting the other site a duplicate content “penalty”, or simply regurgitating content to save time and money on acquiring well written copy. There is no exact formula or ratio to state what is and isn’t considered duplicate content, so be sure to just keep every page of content that you write as original as possible.
Why is Duplicate Content “penalized” by search engines?
The fact is that every search engine aims to be the most efficient and user friendly search engine available. So, if a user performs a search which returns 20 results, all containing the exact same content, then this is an issue as a wider variety of results on one page would be more useful to the user. It is predominantly for this reason that Google and other search engines have ways of identifying and “penalizing” duplicate content. For clarity purposes, I’ve added “penalize” in quotations as strictly speaking, your site ranking does not get affected or penalized, search engines rather “penalize” you by simply not indexing your content.
How do search engines identify duplicate content?
When search engines receive a search string, it crawls many sites and cached databases to find results. Upon finding the results search engines have a duplicate content filter which is used to identify data that is similar or identical. The search engine then needs to decide which of the results to display as the original copy. The following factors are taken into consideration by the search engine:
- How trusted is the Website?
- Does a site have any links pointing back to the original copy of relevant content?
- Otherwise where do most of the links point to?
- Where was the first copy of the content published?
- Is any of the content deliberately “scraped” or regurgitated?
Usually, once a search engine has identified the original source of any content, then it tends to show just that piece of content and discard the rest which are considered as duplicates.
How to deal with Duplicate content:
Dealing with duplicate content really depends on the reason for the duplication and whether or it occurred accidentally due to a mistake made by you or someone else, or whether your copy has been maliciously duplicated for unethical reasons.
A few tips on how to avoid duplicate content internally and also ensure that users find exactly what they need, can be found on here on Google Webmaster Central. Check it out (and notice how I linked to the article instead of copying and pasting duplicate content into my post).
If you find that someone is deliberately and maliciously copying content from your site, there are a few steps you can take:
- You can simply email the site and inform them that you are aware that they have duplicate content on their site and ask for it to be removed. You will be surprised how often this works
- If the site owners refused to remove content, you can then contact Google by submitting a Spam Report Request notifying them of the duplicate content. Google then has the ability to penalize the guilty site by not indexing the relevant pages.
- You can also file an Infringement Notice with Google by visiting their DMCA page and following the instructions.
- Another way is to file a DMCA report with the site’s host company as they usually have processes in place for dealing with these situations, as plagiarized content is against the law.
Obviously, with all the above solutions you need to be able to prove that your content is in fact the original version before any action can occur. You can usually use Internet Archive to show where the content was first published.
These are the very basics of identifying and dealing with duplicate content. If you have any issues you should be able to solve the problem by following the relevant steps mentioned above. If you are determined to avoid duplication or having your content duplicated, there are duplicate content checker tools which can be used to check the uniqueness of any piece of content. Search for these in Google and choose one that best suits your needs.
Keep it original!
Related Posts
- Duplicate content in laymen’s terms
- Stop fooling yourselves, duplicate content is hurtful
- The big bad Google, how to avoid being caught out
- Google’s big brother is watching out for offensive content and images, or are they?
- Search Usability
Tags: duplicate content, Google, Google penalties, seo




