Posts

Making the world (and your site) flat—via a Reverse Proxy

flat_world.jpgIn order to protect some of the inventions in our software, I’ve been working with a law firm that specializes in IP protection. I’ve learned a lot from them, but I’ve learned far more from reviewing the patent applications they sent me back as possible ‘prior art.’ Let me share one of the most interesting ones I’ve seen so far, Patent Application 20070143283. Here is the abstract:

A system and method for optimizing the rankings of web pages of a commercial website within search engine keyword search results. A proxy website is created based on the content on the commercial website. When a search engine spider reaches the commercial website, the commercial website directs the search engine spider to the proxy website. The proxy website includes a series of proxy web pages that correspond to web pages on the commercial website along with modifications that enhance the rankings of the pages by the search engines. However, hyperlinks containing complex, dynamic URLs are replaced with spider-friendly versions. When a human visitor selects a proxy web page listing on the search engine results page, that visitor is directed to the proxy web page. The proxy server delivers the same content to the human visitor as to the search engine spider, only with simplified URLs for the latter.

Read more

The Never Ending SERPs Hijacking Problem: Is there a definite solution?

hijacker.jpgIn 2005 it was the infamous 302, temporary redirect page hijacking. That was supposedly fixed, according to Matt Cutts. Now there is a new interesting twist. Hijackers have found another exploitable hole in Google: the use of cgi proxies to hijack search engine rankings.

The problem is basically the same. Two URLs pointing to the same content. Google's duplicate content filters kick in and drop one of the URLs. They normally drop the page with the lower PageRank. That is Google's core problem. They need to find a better way to identify the original author of the page.

When someone blatantly copies your content and hosts it on their site, you can take the offending page down by sending a DMCA complaint to Google, et al. The problem with 302 redirects and cgi proxies is that there is no content being copied. They are simply tricking the search engine into believing there are multiple URLs hosting the same content.

What is a cgi proxy anyway? Glad you asked. I love explaining technical things :-) Read more

Protecting your privacy from Google with Squid and FoxyProxy

There is no doubt about it; this has definitely been Google’s Privacy Week. Relevant news:

The infamous Privacy International’s report (it basically says that Google sucks in privacy, far more than Microsoft)

Privacy International’s open letter to Google

Danny Sullivan defending Google

Matt Cutts defending his employer

Google’s official response (PDF letter)

Google Video flaw exposes user credentials

It’s only human nature to defend ourselves (and those close to us) when we are under public scrutiny. I am not surprised to see Matt or Danny stand behind Google on this matter. I do think it is far more wise and beneficial to look into criticism and determine for ourselves what we can do to remedy it. I am glad to see that Google took this approach on their official response:

After considering the Working Party’s concerns, we are announcing a new policy: to anonymize our search server logs after 18 months, rather than the previously-established period of 18 to 24 months. We believe that we can still address our legitimate interests in security, innovation and anti-fraud efforts with this shorter period … We are considering the Working Party’s concerns regarding cookie expiration periods, and we are exploring ways to redesign cookies and to reduce their expiration without artificially forcing users to re-enter basic preferences such as language preference. We plan to make an announcement about privacy improvements for our cookies in the coming months.

You can take any side you want. But, I feel that none of the people covering this topic has addressed two critical issues:

1) How do you opt-out of data collection by Google or other search engines at will?

2) And, do you want to wait 18 months for your data to be anonymized? Read more