<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Robots.txt 101</title>
	<link>http://hamletbatista.com/2007/06/04/robotstxt-101/</link>
	<description>Advanced Search Engine Marketing Tips to Succeed Online</description>
	<pubDate>Thu, 24 Jul 2008 04:16:25 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3</generator>
		<item>
		<title>By: Jez</title>
		<link>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-66</link>
		<dc:creator>Jez</dc:creator>
		<pubDate>Sun, 10 Jun 2007 09:47:36 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-66</guid>
		<description>If I get time ;-)

I notice some of it uses Python, its been a long time since Ive used Python but I may have a play with it.

Jez</description>
		<content:encoded><![CDATA[<p>If I get time <img src='http://hamletbatista.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /><br />
I notice some of it uses Python, its been a long time since Ive used Python but I may have a play with it.</p>
<p>Jez</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-65</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Fri, 08 Jun 2007 20:42:44 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-65</guid>
		<description>Jez,

I'm glad to have other developers visiting my blog. Hopefully you can put some of the code in my posts to work. I appreciate any feedback.

It's amazing how we can find open source code for pretty much everything. Moodle.org looks very interesting.</description>
		<content:encoded><![CDATA[<p>Jez,</p>
<p>I&#8217;m glad to have other developers visiting my blog. Hopefully you can put some of the code in my posts to work. I appreciate any feedback.</p>
<p>It&#8217;s amazing how we can find open source code for pretty much everything. Moodle.org looks very interesting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jez</title>
		<link>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-64</link>
		<dc:creator>Jez</dc:creator>
		<pubDate>Fri, 08 Jun 2007 19:45:54 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-64</guid>
		<description>Hi Hamlet,

I am no stranger to code, I work in that field, but most of my experience has been on Intranets... I currently manage a large installation (9 instances) of moodle.org for a University... but there is no SEO requirement for this work... SEO is something I am interested in learning more about...</description>
		<content:encoded><![CDATA[<p>Hi Hamlet,</p>
<p>I am no stranger to code, I work in that field, but most of my experience has been on Intranets&#8230; I currently manage a large installation (9 instances) of moodle.org for a University&#8230; but there is no SEO requirement for this work&#8230; SEO is something I am interested in learning more about&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-63</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Fri, 08 Jun 2007 16:11:33 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-63</guid>
		<description>&lt;blockquote&gt;If it had been me, I would have asked users to collect the link text from a dynamic page that rotated a number of different permutations….&lt;/blockquote&gt;

That doesn't sound like newbie stuff to me :-)

I am working on a post where I am dissecting Google's original paper. Hopefully we all can learn something valuable from it.</description>
		<content:encoded><![CDATA[<blockquote><p>If it had been me, I would have asked users to collect the link text from a dynamic page that rotated a number of different permutations….</p></blockquote>
<p>That doesn&#8217;t sound like newbie stuff to me <img src='http://hamletbatista.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
I am working on a post where I am dissecting Google&#8217;s original paper. Hopefully we all can learn something valuable from it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jez</title>
		<link>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-62</link>
		<dc:creator>Jez</dc:creator>
		<pubDate>Fri, 08 Jun 2007 09:45:31 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-62</guid>
		<description>Oh yes, the point I was trying to make about the cache was that although the old robots file was still cached it is possible that re-allowed pages had already been re-indexed etc. 

I did not explain myself well....</description>
		<content:encoded><![CDATA[<p>Oh yes, the point I was trying to make about the cache was that although the old robots file was still cached it is possible that re-allowed pages had already been re-indexed etc. </p>
<p>I did not explain myself well&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jez</title>
		<link>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-61</link>
		<dc:creator>Jez</dc:creator>
		<pubDate>Fri, 08 Jun 2007 09:43:06 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-61</guid>
		<description>Hi Hamlet,

Sorry if I did not read your post thoroughly enough... your points are interesting, it could well have been the anchor text... or perhaps a mix of anchor and robots.txt.

I thought for some time that JC should have asked users to use a mix of different links. 

If it had been me, I would have asked users to collect the link text from a dynamic page that rotated a number of different permutations....

As for being an expert, far from it, I am here to learn ;-)</description>
		<content:encoded><![CDATA[<p>Hi Hamlet,</p>
<p>Sorry if I did not read your post thoroughly enough&#8230; your points are interesting, it could well have been the anchor text&#8230; or perhaps a mix of anchor and robots.txt.</p>
<p>I thought for some time that JC should have asked users to use a mix of different links. </p>
<p>If it had been me, I would have asked users to collect the link text from a dynamic page that rotated a number of different permutations&#8230;.</p>
<p>As for being an expert, far from it, I am here to learn <img src='http://hamletbatista.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-60</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Fri, 08 Jun 2007 00:40:15 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-60</guid>
		<description>Jez,

Thanks for your comment. I am really glad to have experts visiting my blog.

Please note that I did not rule out the robots.txt changes, as the solution to the problem.

&lt;blockquote&gt;If this was the change that fixed the problem, it might have been because removing those internal pages from the spider view might have weaken his internal link structure. His claim is not without merit.&lt;/blockquote&gt;

I am not sure I follow part of your conclusions. 

&lt;blockquote&gt;… as you know the cache is always a few days old, but the robots.txt will be analysed on the day of the crawl, in “real time”.

If Google never let go of the cached file, how would it ever crawl the site again???&lt;/blockquote&gt;

The robots.txt is "analyzed" (parsed) in real time, but the results of this will need to be reflected when the index is updated. Search engines first crawl and then index pages. 

Dropping pages imply modification to the index (as a result of a crawl).

To me, the pages in the cache are the pages that are affecting the current index. I might be wrong, but I need some research papers that would tell me otherwise.

Again, I am not ruling out the robots.txt as the solution to his problem.

At one moment, I thought that he blocked access to the regular posts by misusing the wildcards ie.: Disallow: /2007/1*. My blog includes the date in normal posts, but I checked his and it doesn't.</description>
		<content:encoded><![CDATA[<p>Jez,</p>
<p>Thanks for your comment. I am really glad to have experts visiting my blog.</p>
<p>Please note that I did not rule out the robots.txt changes, as the solution to the problem.</p>
<blockquote><p>If this was the change that fixed the problem, it might have been because removing those internal pages from the spider view might have weaken his internal link structure. His claim is not without merit.</p></blockquote>
<p>I am not sure I follow part of your conclusions. </p>
<blockquote><p>… as you know the cache is always a few days old, but the robots.txt will be analysed on the day of the crawl, in “real time”.</p>
<p>If Google never let go of the cached file, how would it ever crawl the site again???</p></blockquote>
<p>The robots.txt is &#8220;analyzed&#8221; (parsed) in real time, but the results of this will need to be reflected when the index is updated. Search engines first crawl and then index pages. </p>
<p>Dropping pages imply modification to the index (as a result of a crawl).</p>
<p>To me, the pages in the cache are the pages that are affecting the current index. I might be wrong, but I need some research papers that would tell me otherwise.</p>
<p>Again, I am not ruling out the robots.txt as the solution to his problem.</p>
<p>At one moment, I thought that he blocked access to the regular posts by misusing the wildcards ie.: Disallow: /2007/1*. My blog includes the date in normal posts, but I checked his and it doesn&#8217;t.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jez</title>
		<link>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-59</link>
		<dc:creator>Jez</dc:creator>
		<pubDate>Fri, 08 Jun 2007 00:04:52 +0000</pubDate>
		<guid>http://hamletbatista.com/2007/06/04/robotstxt-101/#comment-59</guid>
		<description>Hi Hamlet,

I have been following this issue too and think you have made a bit of an error... as you know the cache is always a few days old, but the robots.txt will be analysed on the day of the crawl, in "real time".

If Google never let go of the cached file, how would it ever crawl the site again???

The actual crawl runs ahead of the cache, but you already know this...

One thing you may not have seen is this post on JC:

http://www.johnchow.com/getting-out-of-the-google-supplemental-index/

A few days earlier the robots.txt file was changed due to reasons outlined in the above post... give it a few days for the denied pages to be dropped, a couple of days for users to report the drop in SERPs and the timing is about right for Johns "google ban".

Then, the latest robots.txt file reverses what had been done, re-allows the supplemental pages things return to normal.

&lt;b&gt;What we should have checked was whether the supplemental pages were back in the cache&lt;/b&gt;

I think JC made a blunder in blocking his supplemental pages simple as that.

Does anyone really believe Google would change their algorithm because of John Chow!!!!

I think you have to bear in mind that JC survives on hype, spin and reader manipulation, that's what the his site exemplifies. I think he has created a lot of buzz and mystery out of his own %$£% up... that's what he is good at.</description>
		<content:encoded><![CDATA[<p>Hi Hamlet,</p>
<p>I have been following this issue too and think you have made a bit of an error&#8230; as you know the cache is always a few days old, but the robots.txt will be analysed on the day of the crawl, in &#8220;real time&#8221;.</p>
<p>If Google never let go of the cached file, how would it ever crawl the site again???</p>
<p>The actual crawl runs ahead of the cache, but you already know this&#8230;</p>
<p>One thing you may not have seen is this post on JC:</p>
<p><a href="http://www.johnchow.com/getting-out-of-the-google-supplemental-index/" rel="nofollow">http://www.johnchow.com/getting-out-of-the-google-supplemental-index/</a></p>
<p>A few days earlier the robots.txt file was changed due to reasons outlined in the above post&#8230; give it a few days for the denied pages to be dropped, a couple of days for users to report the drop in SERPs and the timing is about right for Johns &#8220;google ban&#8221;.</p>
<p>Then, the latest robots.txt file reverses what had been done, re-allows the supplemental pages things return to normal.</p>
<p><b>What we should have checked was whether the supplemental pages were back in the cache</b></p>
<p>I think JC made a blunder in blocking his supplemental pages simple as that.</p>
<p>Does anyone really believe Google would change their algorithm because of John Chow!!!!</p>
<p>I think you have to bear in mind that JC survives on hype, spin and reader manipulation, that&#8217;s what the his site exemplifies. I think he has created a lot of buzz and mystery out of his own %$£% up&#8230; that&#8217;s what he is good at.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
