I promised everybody that I’d be posting my presentation slides from my talk at the SMX Advanced Bot Herding panel, so here they are!
First, let me say that I was very excited to be speaking at a major search marketing conference, and I can say with confidence that all the traveling was definitely worth it. My only regret is that I did not get to finish my presentation. This is the first time I spoke publicly and as an inexperienced speaker I was not even looking at the timer. My apologies to all those in attendance. Frankly, I do think speakers should be allowed a little bit more time for SMX Advanced, as you really do need time to lay the groundwork before delving deeply into these sorts of topics.
For those that didn’t come, let me summarize the key takeaways from my speech and put it into context regarding Google’s recent post on Webmaster Central:
Cloaking: Serving different content to users than to Googlebot. This is a violation of our webmaster guidelines. If the file that Googlebot sees is not identical to the file that a typical user sees, then you’re in a high-risk category. A program such as md5sum or diff can compute a hash to verify that two different files are identical.
Basically, Google says that geolocation and IP delivery (when used for geolocation purposes) are fine as long as you present the same content to the Googlebot as you would present to the user coming from the same region. Altering the content the robot sees puts you in “a high-risk category.” Google is so strict that it suggests you need a checksum program to make sure you are delivering the same content. Obviously, it doesn’t matter if your intention is to improve the crawling and indexing of your site or not.
Why would you want to cloak anyway?
Let’s talk about the key scenarios I discussed in my speech:
- Content accessibility
– Search unfriendly Content Management Systems. According to Google, if you are using a proprietary CMS that does not allow the flexibility of making the URLs search-engine friendly, or if it has cookie-based session IDs, or has unique titles and descriptions, you need to replace your CMS with a newer one. Using a reverse proxy that cloaks to fix those issues is a “bad idea.” Again: easy for Google, hard for the customer.
– Content behind forms. Google is experimenting with a bot that will try to pull content from basic forms using HTTP GET and providing values listed in the HTML.
- Membership sites
– Free and paid content. Google recommends we register our premium content using Google News’ First click free. The idea is that you need to give searchers the first page of your content for free and they need to register for the rest. This is very practical for newspapers that have resorted to cloaking in the past. I do see a problem with this technique for sites like SEOmoz where some of the premium pages are guides that cost money. If SEOmoz signed up for this service, I would be able to pull all the guides by guessing search terms that would bring them up in the results.
- Site structure improvements
– Alternative to PageRank sculpting via “no-follow.” I explained a clever technique where you can cloak a different link path to robots than you present to regular users. The link path for users should be focused on ease of navigation and the link path to regular users should be focused on ease of crawling and deeper index penetration. This is very practical but not really mandatory.
- Geolocation/IP delivery
– According to the post we don’t need to worry about this. Some good news at last!
- Multivariate testing
Google = Romulans
Just like the Romulans from Star Trek, Google doesn’t want cloaking technology in the hands of everyone. I didn’t get to talk about this in my presentation, but let me speculate as to why Google is drawing such a hard line on cloaking: Simply put, it is the easiest, cheapest and most scalable solution for them.
1. As a developer I can tell you that running checksums against the content presented to Googlebot vs. the content presented to the cloaking detection bots is the easiest and most scalable way for them to do it.
2. Similar to the problem with paid links, it is easier to let us do all the work of labeling our sites so they can detect the bad guys without having to dedicate a huge amount of resources to solve such problems.
Enjoy the slides and feel free to ask any questions. If you were there at SMX Advanced and watched me present, please let me know your honest comments. Criticism can only help me improve. Let me know what you think of the slides, too. Originally, I had planned to use more graphics than text, but ultimately I thought that the advanced audience would appreciate the added information.