Out of the Supplemental Index and into the Fire

fire.jpgDo you have pages in Google's supplemental index? Get 'em out of there!

Matt Cutts of Google doesn't think SEOs and website owners should be overly concerned about having pages in the supplemental index. He has some pages in the supplemental index, too.

As a reminder, supplemental results aren’t something to be afraid of; I’ve got pages from my site in the supplemental results, for example. A complete software rewrite of the infrastructure for supplemental results launched in Summer o’ 2005, and the supplemental results continue to get fresher. Having urls in the supplemental results doesn’t mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they’re in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past. The approach I’d recommend in that case is to use solid white-hat SEO to get high-quality links (e.g. editorially given by other sites on the basis of merit).

Google is even considering removing the supplemental result tag. There won't be any way for us to tell if any page is supplemental.

Over time, the supplemental results are less and less supplemental and more and more likely to show up for any given query. As I mentioned at SMX Seattle, my personal preference would be to drop the "Supplemental Result" tag altogether because those results are 1) getting fresher and fresher, and 2) starting to show up more and more often for regular web searches. Especially as the supplemental results get more fresh, I'd like to leave that tag behind because it still has some negative connotations for people who remember the previous implementation of supplemental results (which has now mostly been replaced with a newer/better implementation).

If they ever remove the supplemental tag, I agree with Rand in that that information should be available via WebmasterCentral. Why? Supplemental pages are pages that do not have enough quality back links, and as such are not deserving enough to be listed in the main index.

As many webmasters have noted, web pages in the supplemental index do not send as much traffic as the pages in the main index. Let me give you the technical explanation for this. I found this out via a friendly exchange of comments with halfdek in searchengineland.com. He provided a direct quote from the Matt Cutts's Q&A video at SMX Seattle Advanced.

We parse pages and we index pages differently when they're in the supplemental index. Think of it almost as if its sort of a compressed summary. So we'll index some of the words in different ways on pages from the supplemental index, but not necessarily every single word in every single phrase relationship…

The reason why your pages in the supplemental index do not send as much traffic as the ones in the regular index is because Google only indexes a summary of those pages, instead of indexing the full content as it does for the regular index. In situations like these my series on Google's inner workings comes in handy. Take this example:

You've just finished up reading Seth Godin's popular book Unleashing the Ideavirus — a fascinating read. The book has a copious index at the end which is very useful for finding pages if you remember any of the funny words he came up with (hive, sneezer, etc.). Now, imagine your friend reads the book and has to write up a summary for a college report. Later he is asked to create an index of the book, but he only uses the words that are in his summary. Now he can find pages in the book by any of the words in his summary. Unfortunately, because there are fewer words in the summary, his index is paltry compared to the book's true index. You won't find "sneezer" in it for sure! Suddenly his ability to find pages is going to be more limited. The book's full index is simply more comprehensive. There are more words to use to find more pages.

In a similar fashion, having your pages in the supplemental index means that only a portion of the words are being indexed. There are many words and combinations that will not trigger your result in the search. I don't know about you, but if I write a 2000-word post, full of longtail keywords, I want the full page to be indexed. No summaries for me if you please.

Let's hope Google keeps the supplemental result tag. Knowing which of our pages are supplemental helps us promote them better and get more links to them.

8 replies
  1. Jez
    Jez says:

    Hi Hamlet,

    In the case of a WordPress blog, pages are supplemental as they duplicate the content, through the comments page, archives etc.
    In this instance, your content would still be indexed under the main post page, so, are you really losing anything when this happens?

    Jez

    Reply
    • Hamlet Batista
      Hamlet Batista says:

      Jez,

      Pages end up in the supplemental index based on their PageRank. Being a duplicate is not the main reason to end up there. Please read Matt Cutts make this clear here

      Having many duplicate pages might cause those pages to have low PageRank values

      Reply
      • Jez
        Jez says:

        Hi Hamlet,

        My question was prompted by your use of a WordPress SEO plugin that is specifically designed to remove duplicate content (such as comment pages) from the supplemental index….
        Are there any benefits to doing this other than general maintenance?

        Reply
  2. Florchakh
    Florchakh says:

    Supplemental index has been one of my favorite Google playthings. Works like a nice reminder for keeping pretty optimization on your sites, also it have been showing "value" of your SEO competitors.

    Now it looks like like situation is getting worse, on some of my sites there are very different results on respective Data Centers. Maybe folks from Google have taken it off? Just take a look on current results of supplemental query for your blog :/
    http://oy-oy.eu/google/supplemental/?url=http://h

    Reply
  3. Mutiny Design
    Mutiny Design says:

    Just as a note. A while back I worked on a site that was sandboxed. Despite having a few dozen good incoming links and fair title tags and page content it didn’t rank in the top 100 pages for a term which was largely uncompetitive. The site had what I call geographic spam pages – pages that have almost exactly the same content but are targeted to a geographic area, e.g. flowers Shrewsbury. Also, if you tried to Google the 'geographic spam pages' even if there was no competition the page would only come up on about page 25.

    At the insistence of the company, we launched a new site, which again had geographic spam pages, although considerably less. Without getting any more incoming links, they shot up to page five for their search term at the next index. After getting about 100 or so industry links into their site, they ranked at the top of page one for their search term. Additionally, a lot of the geographic spam pages were coming up on page one, even for places which I wouldn't have expected them to. At this point none of the geographic spam pages were in the supplemental index.

    Now, about three months latter, most of the geographic spam pages are in the supplemental index and the main page of the site has dropped in the rankings.

    This leads me to believe that Google may use their duplicate content filter in deciding whether a page should be in the supplemental index.

    I will be making a recommendation that this site remove their geographic spam pages and will post the results if I can remember to.

    Adding more weight to this, I worked on another site – a business directory that has about 700,000 pages. The site has very little variation. Its mainly pages listing businesses and pages with a businesses address. The site has about 250 decent incoming links. Initially, Google didn't take much interest in the site, but Yahoo was ploughing though it. Now, Yahoo has next to no index but Google has a full index. By the time Google had spidered the site it had about 500,000 pages not in the supplemental index. Over six months it has decreased and now only about 100,000 pages are in the supplemental index. To make it even worse they have put almost exactly the same site on another domain.

    Reply
    • Hamlet Batista
      Hamlet Batista says:

      Mutiny,

      "geographic spam pages" is an interesting concept. First time I read about it.

      This leads me to believe that Google may use their duplicate content filter in deciding whether a page should be in the supplemental index.

      In order to explain why duplicate pages might end up in the supplemental index, let me quote Matt Cutts:
      http://www.seomoz.org/blog/whiteboard-friday-supp

      duplicate content doesn't make you more likely to have pages in the supplemental index in my experience. It could be a symptom but not a cause, e.g. lots of duplicate content implies lots of pages, and potentially less PageRank for each of those pages. So trying to surface an entire large catalog of pages would mean less PageRank for each page, which could lead to those pages being less likely to be included in our main web index.

      The problem with a lot of duplicate content is that the pages' PageRank get affected. Duplicate content is a symptom, not a cause. According to Matt, pages end up in the supplemental index based on low PageRank values. You might not agree with this, but I personally prefer to trust the source.

      I will be making a recommendation that this site remove their geographic spam pages and will post the results if I can remember to.

      I'd do the same. Please report your results

      Reply

Trackbacks & Pingbacks

  1. [...] couple of weeks ago, Hamlet Batista wrote a post on getting out of the supplemental index and made reference to the fact that Google were considering removing the supplemental result [...]

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply