Advanced link cloaking techniques

The interesting discussion between Rand and Jeremy had me thinking about some of the things affiliates do to protect their links. I am talking about link cloaking — the art of hiding links.

We can hide links from our potential customer (in the case of affiliate links), and we can hide them from the search engines as well (as in the case of reciprocal links, paid links, etc.).

While I think cloaking affiliate links to prevent others from stealing your commissions is useful, I am not encouraging you to use the techniques I am about to explain. I certainly think it is very important to understand link cloaking in order to protect yourself when you are buying products, services or links.

When I am reading a product endorsement, I usually mouse over the link to see if it is an affiliate link. Why? I don’t mind the blogger making a commission’; but, If I see he or she is trying to hide it via redirects, Java-script, etc. I don’t perceive it is as an endorsement.  I feel it is a concealed ad. When I see <aff>, editor’s note, etc. I feel I can trust the endorsement.

Another interesting technique is the cloaking of links to the search engines. The reasoning behind this concept is so that your link partners think you endorse them, but you tell the search engines that you don’t. Again, I am not supporting this.

Cloaking links to the potential customers.

Several of the techniques, I’ve seen are:

Script redirects – the use of simple script that takes a key (i.e: merchant=eBay), pulls the affiliate link from a database or from an in-line dictionary (programming term), and sends the visitor’s browser an HTTP 302 or 301 (temporary or permanent redirect) to the merchant site.

Meta refreshes – the use of blank HTML pages with the meta refresh tag and the affiliate tracking code embedded. This is very popular.

In-line Java-script– the use of Java-script to capture the mouse over, and the right click event from the target link, in order to make the status bar display the link without the tracking code. I feel this one is very deceptive.

Encoding URLs – the use of HTML character entities or URL encoding to obfuscate the tracking links or tracking codes from your visitors. This works because browsers understand the encoding and humans are unable to understand them without some work.

Java-script + image links – This is really advanced. I haven’t seen this being used much. The idea is to use Java-script to capture the on_click event and have the code pull a transparent image before transferring control to the new page. The trick is that the URL of the transparent image, is in reality a tracking script, that receives the tracking code as a parameter or as part of the URL.

Cloaking links to the search engines.

These are some of the techniques I’ve seen:

Use of rel=”no-follow” anchor attribute. I would not say this is technically cloaking, but the results are the same. Search engines (Google, Yahoo and Live) will not ‘respect’ those links.

Use of no-follow and/or no-index meta tag. There is a slight difference between the use of no-follow in the anchor link tag vs the meta robots tag. When used on the robots meta tag it means: “do not follow the links on this page”. When used on the anchor tag it tells the search engine “do not consider my link for your scoring” ( this link is not a vote/endorsement).

Crawler user agent check. This consists in detecting the search engine crawler by user agent via the HTTP REFERRER header and hiding the link or presenting the search engine a link with rel-no-follow. Normal visitors will not see this.

Crawler IPs check. Black hat SEOs keep a list of search engine crawler IP addresses to make cloaking more effective. While search engine crawlers announce their presence via the user agent header, when using cloaking detection algorithms they don’t.  Keeping a record of crawler IPs help detect them.

Once the crawler is detected, the same technique I just mentioned is used to hide the target links.

Robot.txt disallow. Disallowing search engine crawlers access to specific sections of your website (ie: link partner pages)via robot.txt, is another way to effectively hide those links from the search engines.

The use of robots-nocontent class. This is a relatively new addition (only Yahoo supports this at the moment). With this CSS class, you can tell the Yahoo crawler that you don’t want it to index portions of a page. Hiding link sections is another way to cloak links.

Robot.txt disallow + crawler IPs check. I haven’t seen this being used, but it’s technically possible. The idea is to  present search engines a different version of your robot.txt file than you present to users. The version you present to the search engines prohibits sections of your site where the links you want o hide are. You detect the search engine robot either by the user agent or by a list of known robots’ IP addresses. Note that you can prevent the search crawler from caching the robot.txt file making detection virtually impossible.

Now, as I said before, I am not exposing these techniques to promote them. On the contrary, here’s why it’s important to detect them.

The best way to detect cloaked links is to look closely at the HTML source and robot.txt file, and specially at the cache versions of those files. If you are buying or trading links for organic purposes (assuming you don’t get reported as spam by your competitors), don’t buy or trade links that use any of these techniques or that prevent the search engines from caching the robots.txt file or page in question.

3 replies

Trackbacks & Pingbacks

  1. Hello world!…

    Welcome to Pedro.net.au. This is your first post. Edit or delete it, then start blogging!
    ……

  2. […] I found aren’t real recent. But when people like Everett Sizemore, Sebastian, Joost de Valk, Hamlet Batista, John Chow, Shoemoney, and Dave Taylor all think that the subject deserves a post, then maybe […]

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply