Duplicate content is one of the most common causes of concern among webmasters. We work hard to provide original and useful content, and all it takes is a malicious SERP (Search Engine Results Page) hijacker to copy our content and use it for his or her own. Not nice.
More troubling still is the way that Google handles the issue. In my previous post about cgi hijacking, was clear that the main problem with hijacking and content scraping is that search engines do not reliably determine who is the owner of the content and, therefore, which page should stay in the index. When faced with multiple pages that have exactly the same or nearly the same content, Google's filters flag them as duplicates. Google's usual course of action is that only one of the pages — the one with the higher PageRank — makes it to the index. The rest are tossed out. Unless there is enough evidence to show that the owner or owners are trying to do something manipulative, there is no need to worry about penalties.
Recently, regular reader Jez asked me a thought-provoking question. I'm paraphrasing here, but essentially he wanted to know: "Why doesn’t Google consider the age of the content to determine the original author?” I responded that the task is not as trivial as it may seem at first, and I promised a more thorough explanation. Here it is. Read more