<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Content is King, but Duplicate Content is a Royal Pain.</title>
	<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/</link>
	<description>Advanced Search Engine Marketing Tips to Succeed Online</description>
	<pubDate>Mon, 08 Sep 2008 11:35:42 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3</generator>
		<item>
		<title>By: Richard Chmura</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3417</link>
		<dc:creator>Richard Chmura</dc:creator>
		<pubDate>Mon, 15 Oct 2007 14:42:14 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3417</guid>
		<description>But don't give up hope entirely.  With some human review, it could be possible to whitelist some sites for inclusion in this system.  It would only work if the whitelist was limited to these human reviews - or a similar method of content ownership verification.

Definitely not a dead idea.  But it must have clear boundaries and limits.</description>
		<content:encoded><![CDATA[<p>But don&#8217;t give up hope entirely.  With some human review, it could be possible to whitelist some sites for inclusion in this system.  It would only work if the whitelist was limited to these human reviews - or a similar method of content ownership verification.</p>
<p>Definitely not a dead idea.  But it must have clear boundaries and limits.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3416</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Mon, 15 Oct 2007 14:36:06 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3416</guid>
		<description>I agree.</description>
		<content:encoded><![CDATA[<p>I agree.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard Chmura</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3412</link>
		<dc:creator>Richard Chmura</dc:creator>
		<pubDate>Mon, 15 Oct 2007 10:50:54 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3412</guid>
		<description>If you think like a bad guy: scraping the content and registering it, stealing from a site or blog that had not implemented this process.  This will create a false positive and a disadvantage to all sites that do not subscribe to this technology.  The possibility of every single legitimate content owner using a uniform method is slim - even today we have sites which don't conform to w3c standards or don't even work in various browsers.  It is possible that this might make it easier for illegtimate sites to steal content.</description>
		<content:encoded><![CDATA[<p>If you think like a bad guy: scraping the content and registering it, stealing from a site or blog that had not implemented this process.  This will create a false positive and a disadvantage to all sites that do not subscribe to this technology.  The possibility of every single legitimate content owner using a uniform method is slim - even today we have sites which don&#8217;t conform to w3c standards or don&#8217;t even work in various browsers.  It is possible that this might make it easier for illegtimate sites to steal content.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3405</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Mon, 15 Oct 2007 00:19:43 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3405</guid>
		<description>&lt;blockquote&gt;The problem I see with this approach is the likeliness of false positives. (identifying scrapers as legitimate content)&lt;/blockquote&gt;

Thanks, Richard. Please note that the scrappers access the content via RSS, etc. If the blogging software performs the registration before publishing the content or RSS feed the content author will have a head start. There would be no way for scrappers to access the content before it is registered.</description>
		<content:encoded><![CDATA[<blockquote><p>The problem I see with this approach is the likeliness of false positives. (identifying scrapers as legitimate content)</p></blockquote>
<p>Thanks, Richard. Please note that the scrappers access the content via RSS, etc. If the blogging software performs the registration before publishing the content or RSS feed the content author will have a head start. There would be no way for scrappers to access the content before it is registered.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard Chmura</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3399</link>
		<dc:creator>Richard Chmura</dc:creator>
		<pubDate>Sun, 14 Oct 2007 20:08:52 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-3399</guid>
		<description>my two cents:  (to be the devil's advocate here) It is an interesting proposition to allow registering of content.  The problem I see with this approach is the likeliness of false positives.  (identifying scrapers as legitimate content)  However, this is a step in establishing trust with a small set of publishers.  I would guess that only after some form of algo. or human review would a site have access to this mechanism.  - In that sense, any DMCA or copy-spam complaint would revoke current and future access to such a system.

Said system would not be effective in the "pure wild" ;)  Filtering and sorting duplicate content in the "wild" level will need another approach.
-I'll comment on http://hamletbatista.com/2007/10/11/like-flies-to-project-honeypot-revisiting-the-cgi-proxy-hijack-problem/ about my suggestions for dealing with the "wild" internet.</description>
		<content:encoded><![CDATA[<p>my two cents:  (to be the devil&#8217;s advocate here) It is an interesting proposition to allow registering of content.  The problem I see with this approach is the likeliness of false positives.  (identifying scrapers as legitimate content)  However, this is a step in establishing trust with a small set of publishers.  I would guess that only after some form of algo. or human review would a site have access to this mechanism.  - In that sense, any DMCA or copy-spam complaint would revoke current and future access to such a system.</p>
<p>Said system would not be effective in the &#8220;pure wild&#8221; <img src='http://hamletbatista.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  Filtering and sorting duplicate content in the &#8220;wild&#8221; level will need another approach.<br />
-I&#8217;ll comment on <a href="http://hamletbatista.com/2007/10/11/like-flies-to-project-honeypot-revisiting-the-cgi-proxy-hijack-problem/" rel="nofollow">http://hamletbatista.com/2007/10/11/like-flies-to-project-honeypot-revisiting-the-cgi-proxy-hijack-problem/</a> about my suggestions for dealing with the &#8220;wild&#8221; internet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Florchakh</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-450</link>
		<dc:creator>Florchakh</dc:creator>
		<pubDate>Sat, 21 Jul 2007 11:33:29 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-450</guid>
		<description>I'm not sure that speaking these facts to the wide public is a good idea.

Or not - c'mon Hamlet, spread it. Let's get more people interested in spamming and help to put sites of people not interested in SEO straight into the flames of spammy net business. I guess your blog will became more popular ]:-&#62;</description>
		<content:encoded><![CDATA[<p>I&#8217;m not sure that speaking these facts to the wide public is a good idea.</p>
<p>Or not - c&#8217;mon Hamlet, spread it. Let&#8217;s get more people interested in spamming and help to put sites of people not interested in SEO straight into the flames of spammy net business. I guess your blog will became more popular ]:-&gt;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-442</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Fri, 20 Jul 2007 23:21:58 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-442</guid>
		<description>Mutiny - please send me an email with some examples of 'geographic spam pages'. I will try to digg deeper to give a more accurate response.

there are a couple of bugs in the comments system that I want to solve. Let me see if can find some time early next week</description>
		<content:encoded><![CDATA[<p>Mutiny - please send me an email with some examples of &#8216;geographic spam pages&#8217;. I will try to digg deeper to give a more accurate response.</p>
<p>there are a couple of bugs in the comments system that I want to solve. Let me see if can find some time early next week</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-441</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Fri, 20 Jul 2007 23:18:37 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-441</guid>
		<description>Jez,

The crawled date is unreliable too. The crawler could visit the scrapper site before yours.

&lt;blockquote&gt;
It would also be possible to time delay the rss feed, putting a greater time-distance between your ping and that of a scraper.&lt;/blockquote&gt;
I like your idea of introducing a delay or not releasing the feed until the content is registered. Well done!
&lt;blockquote&gt;
If this were introduced though, there could be a gold-rush as SE Spammers raced to re-publish all the content on the internet using this method&lt;/blockquote&gt;

I guess there would need to be some legal consecuences for people registering content that is not theirs.</description>
		<content:encoded><![CDATA[<p>Jez,</p>
<p>The crawled date is unreliable too. The crawler could visit the scrapper site before yours.</p>
<blockquote><p>
It would also be possible to time delay the rss feed, putting a greater time-distance between your ping and that of a scraper.</p></blockquote>
<p>I like your idea of introducing a delay or not releasing the feed until the content is registered. Well done!</p>
<blockquote><p>
If this were introduced though, there could be a gold-rush as SE Spammers raced to re-publish all the content on the internet using this method</p></blockquote>
<p>I guess there would need to be some legal consecuences for people registering content that is not theirs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mutiny Design</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-437</link>
		<dc:creator>Mutiny Design</dc:creator>
		<pubDate>Fri, 20 Jul 2007 18:12:49 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-437</guid>
		<description>Something I mentioned earlier was 'geographic spam pages' - pages with duplicate content but use different places names (towns, counties etc). These are always a real pain and are extremely common pactice for web design comapnies, but I have noticed them for various industries.


Looking at a few seaches now, the same companies are still ranking for the same geographic terms using geographic spam pages. Do you have any input on this?


Another culprit of geographic SERP littering are business directories that just have lists of businesses on them.

P.S. is it possible to use nl2br on the comments (if you can do that with wordpress) or is the br element enabled?</description>
		<content:encoded><![CDATA[<p>Something I mentioned earlier was &#8216;geographic spam pages&#8217; - pages with duplicate content but use different places names (towns, counties etc). These are always a real pain and are extremely common pactice for web design comapnies, but I have noticed them for various industries.</p>
<p>Looking at a few seaches now, the same companies are still ranking for the same geographic terms using geographic spam pages. Do you have any input on this?</p>
<p>Another culprit of geographic SERP littering are business directories that just have lists of businesses on them.</p>
<p>P.S. is it possible to use nl2br on the comments (if you can do that with wordpress) or is the br element enabled?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-431</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Fri, 20 Jul 2007 14:17:36 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-431</guid>
		<description>Geoff - This is just a raw idea. I want you to find holes and hopefully together we can come up with a stronger plan.

For authorized syndication parters I favor Google's proposal:

&lt;blockquote&gt;
If you syndicate content, we suggest that you ask the sites who are using your content to block their version with a robots.txt file as part of the syndication arrangement to help ensure your version is served in results.
&lt;/blockquote&gt;</description>
		<content:encoded><![CDATA[<p>Geoff - This is just a raw idea. I want you to find holes and hopefully together we can come up with a stronger plan.</p>
<p>For authorized syndication parters I favor Google&#8217;s proposal:</p>
<blockquote><p>
If you syndicate content, we suggest that you ask the sites who are using your content to block their version with a robots.txt file as part of the syndication arrangement to help ensure your version is served in results.
</p></blockquote>
]]></content:encoded>
	</item>
	<item>
		<title>By: Geoff</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-429</link>
		<dc:creator>Geoff</dc:creator>
		<pubDate>Fri, 20 Jul 2007 12:31:35 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-429</guid>
		<description>I like where you are going with this, but what about sites like Wikipedia that license content to other sites like Answers.com? 

I'm not trying to punch any holes in your idea. I hope something like this comes about. To do so, lots of angles have to be considered. It sounds like you already have a great start.</description>
		<content:encoded><![CDATA[<p>I like where you are going with this, but what about sites like Wikipedia that license content to other sites like Answers.com? </p>
<p>I&#8217;m not trying to punch any holes in your idea. I hope something like this comes about. To do so, lots of angles have to be considered. It sounds like you already have a great start.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jez</title>
		<link>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-425</link>
		<dc:creator>Jez</dc:creator>
		<pubDate>Fri, 20 Jul 2007 09:48:38 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/07/19/content-is-king-but-duplicate-content-is-a-royal-pain/#comment-425</guid>
		<description>Hi Hamlet,

An interesting article. When I mentioned using dates, I was thinking of the date Google first found that content, as opposed to the creation date given by the site.
Your point about srapers is a good one though, a site scraping fresh content has a good chance of being indexed before the origial source.
Also your copyright notice may work on some syndicating sites, but could be stripped out by a more determined SE spammer.
Your idea of "first ping" is a good one. 
It would also be possible to time delay the rss feed, putting a greater time-distance between your ping and that of a scraper.
If this were introduced though, there could be a gold-rush as SE Spammers raced to re-publish all the content on the internet using this method :-)</description>
		<content:encoded><![CDATA[<p>Hi Hamlet,</p>
<p>An interesting article. When I mentioned using dates, I was thinking of the date Google first found that content, as opposed to the creation date given by the site.<br />
Your point about srapers is a good one though, a site scraping fresh content has a good chance of being indexed before the origial source.<br />
Also your copyright notice may work on some syndicating sites, but could be stripped out by a more determined SE spammer.<br />
Your idea of &#8220;first ping&#8221; is a good one.<br />
It would also be possible to time delay the rss feed, putting a greater time-distance between your ping and that of a scraper.<br />
If this were introduced though, there could be a gold-rush as SE Spammers raced to re-publish all the content on the internet using this method <img src='http://hamletbatista.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
</channel>
</rss>
