Posts

How to Think Like an SEO Expert

If you want to become an expert you need to start thinking like one. People perceive you as an authority in your field not because you claim you are, but by listening to what you say or reading what you write. From my personal experience, the key seems to be the originality, usefulness and depth of what you have to share. Recently I was very honored to contribute to a link-building project. I wanted to share with you my idea, but more than that, in this blog I like to take extra time to explain the original thought process that helped me come up with the idea in the first place.

The Challenge

Toolbar PageRank was a very important factor in measuring the quality of a link for a long while. But Google has played so much with it that it can hardly be considered reliable these days. I like to see problems like these as challenges and opportunities, so I decided to look hard for alternatives. I know there are several other methods (like using the Yahoo backlink count, number of indexed pages, etc.) but I did not feel these directly reflected how the link was important to Google, or to any other specific search engine. Each search engine has its own evaluation criteria when it comes to links, so using metrics from one to measure another is not a reliable gauge in my opinion.

I knew the answer was out there, and I knew just where to look. Read more

Google's Architectural Overview (Ilustrated) — Google's inner workings part 2

For this installment of my Google's inner workings series, I decided to revisit my previous explanation. However, this time I am including some nice illustrations so that both technical and non-technical readers can benefit from the information. At the end of the post, I will provide some practical tips to help you improve your site rankings based on the information given here.

To start the high level overview, let's see how the crawling (downloading of pages) was described originally.

Google uses a distributed crawling system. That is, several computers/servers with clever software that download pages from the entire web Read more

What is the practical benefit of learning Google's internals?

I forgot to start my Google inner workings series with WIIFM. My plan is to write one post each week.

Not matter how well I try to explain it, it is a complex subject. I should have started the first post explaining why you would want to learn that. There are a lot of easier things to read.With some people questioning the usefulness of SEO, this is a good time to make my views clear. Please note that I believe in a solid marketing mix that includes SEO, PPC, SMO, affiliate marketing, viral marketing, etc. Do not put all your eggs in one basket.

If you have been blogging for a while, you have probably noticed that you are getting hits from the search engines for words that you did not try to optimize. For example, the next day I started this blog, I received a comment from a reader that found my blog through a blog search! How was this possible?

Heather Paquinas May 26th, 2007 at 1:24 am

I found your blog in google blogsearch. Needless to say I subscribed right away after reading this. I always suspected what you said, especially after Mike Levin from hittail blogged about using hittail for ppc, but you really hit the nail on the head with this post.

This is possible because that is the job of the search engines! If every page you search had to be optimized, there wouldn't be billions of pages in Google index. It would take a lot of people to do the SEO work :-).

Why we need SEO then? Read more

Google's architectural overview — an introduction to Google's inner workings

Google keeps tweaking its search engine, and now it is more important than ever to better understand its inner workings.

Google lured Mr. Manber from Amazon last year. When he arrived and began to look inside the company’s black boxes, he says, that he was surprised that Google’s methods were so far ahead of those of academic researchers and corporate rivals.

While Google closely guards its secret sauce, for many obvious reasons, it is possible to build a pretty solid picture of Google's engine. In order to do this we are going to start by carefully dissecting Google's original engine: How Google was conceived back in 1998. Although a newborn baby, it had all the basic elements it needed to survive in the web world.
Read more

Log based link analysis for improved PageRank

While top website analytics packages offer pretty much anything you might needto find actionable data to improve your site, there are situations where we need to dig deeper to identify vital information.

One of such situations came to light in a post by randfish of Seomoz.org.He writes about the problem with most enterprise-size websites, they have many pages with no or very few incoming links and fewer pages that get a lot of incoming links.He later discusses some approaches to alleviate the problem, suggesting primary linking to link-poor pages from link-rich ones manually, or restructuring the website.I commented that this is a practical situation where one would want to use automation.

Log files are a goldmine of information about your website: links, clicks, search terms, errors, etcIn this case, they can be of great use to identify the pages that are getting a lot of links and the ones that are getting very few.We can later use this information to link from the rich to the poor by manual or automated means.

Here is a brief explanation on how this can be done.

Here is an actual log entry to my site tripscan.com in the extended log format: 64.246.161.30 – – [29/May/2007:13:12:26 -0400] “GET /favicon.ico HTTP/1.1″ 206 1406 “http://www.whois.sc/tripscan.com” “SurveyBot/2.3 (Whois Source)” “-”

First we need to parse the entries with a regex to extract the internal pages — between GET and HTTP — and the page that is linking after the server status code and the page size.In this case, after 206 and 1406.

We then create two maps: one for the internal pages — page and page id, and another for the external incoming links page and page id as well.After that we can create a matrix where we identify the linking relationships between the pages. For example: matrix[23][15] = 1, means there is a link from external page id 15 to internal page id 23.This matrix is commonly known in information retrieval as the adjacency matrix or hyper link matrix.We want an implementation that can be preferably operated from disk in order to be able to scale to millions of link relationships.

Later we can walk the matrix and create reports identifying the link-rich pages, the pages with many link relationships, and the link-poor pages with few link relationships. We can define the threshold at some point (i.e. pages with more or less than 10 incoming links.)