* by: Bob Sakayama *
15 July 2009: The recently announced changes in nofollow require us to rethink the strategies for handling internal PR. This is a big topic for enterprise SEO, because once you have significant PR on your site, how it is conserved and passed internally can make a huge difference both in the ranks and the PR of your pages. Basically, you want to be able to enable PR to flow to your important pages, even as you restrict what pages get indexed.
The Old Way
Once, we blocked all links, especially from the homepage, to pages that did not contribute to rank using the nofollow attribute on the links, assuming that the PR was conserved for the do follow pages. Now that we know this only causes the PR to be discarded, we should no longer use nofollow on link attributes except to protect the site from outbound links being seen as paid.
We have been working on higher level strategies to replace what we thought we were doing with the nofollow attribute on links. The tools at our disposal that can address how PR is handled include the robots.txt file, on-page robots noindex tags, and the new link canonical tag. A quick review of these tools, and their impact on PR is in order.
Some relevant information about robots.txt in this regard:
- blocks the page from indexing based on internal links
- disallowed pages still accrue PR
- pages are not crawled and so do not pass PR
- but pages may appear in the search results if external links point to them
The simplest strategies of blocking pages with robots.txt disallows still work in keeping the site from internally indexing pages and directories that create conflicts or redundancies. And while these simple blocking actions can be effective, the limitations of robots.txt become apparent once you begin to examine its impact on PR.
Generally this protocol enables us to globally disallow specific directories and pages from being indexed via internal links. Pages that are disallowed in robots.txt are usually permitted by Google to be manually deleted from its index. So this is one technique we can use to clean the index should pages appear in there that we want to remove. But the fact that the page is not crawled means that even though the page may accrue PR, it is not passed to other pages. So this is NOT a good replacement for nofollow strategies.
On-Page Robots noindex Meta Tag
Some relevant information about an on-page robots noindex tag:
- blocks indexing - period - page will not appear in the search results
- still accrues PR
- still passes PR (unless nofollow is part of the instruction)
The last 2 are the critical points. Because pages blocked by on-page robots noindex tags still pass PR (provided you don't also use the nofollow instruction), this looks like a potentially useful tool for controlling internal PR distribution.
Canonical Link Tag
Some relevant information about the link canonical tag:
- permits redundancies to exist without causing conflict
- aggregates PR from multiple urls to one specified url
- protects against improper indexing of tagged pages, paginated pages, etc.
While this tag enables the aggregation of PR (like a weaker 301 redirect without leaving the page), it is intended for pages that are the same or very similar. We have experiments running to test the association of unrelated pages - and they appear to show that the tag is respected even when the content is not similar. But we don't yet have enough information to recommend using it for anything other than dealing with redundancies.
At this time this does not appear to be part of a viable replacement strategy for nofollow.
The Specific Problem:
The Policies page has no rank value, don't want it indexed, and the links to it used to be nofollowed. How do we handle now?
The Easy Way
- Continue to nofollow the link. (PR discarded, page not indexed unless via external link)
- Do follow the link, disallow the page in robots.txt. (PR is trapped on the page, which will not be indexed unless via external link)
- Do follow the link, use on-page robots noindex tag. (PR is passed through the page, page will not be indexed)
The easy way usually involves loss of PR either discarded by the nofollow, trapped on a disallowed page (because it is not crawled, the links do not pass the PR on), or lost incrementally (15%) via passing through the page.
The last solution is our current recommendation of the easy fixes, because it has the smallest PR loss. This is also an easy fix to automate across all previously nofollowed links/pages. NB: the robots meta tag cannot also carry a nofollow instruction, only noindex.
The Optimized Solution
The optimized model will show no PR loss. That means that links that used to be nofollowed cannot be normal href links, because there is no way to prevent them from wasting or discarding PR.
One more time. In the highly optimized model, these can no longer be normal href links.
The concept is incredibly simple: use a link the bots can't spider, and use noindex on the page, just in case someone links to it. Then you can keep the primary navigation structure, and should it receive some external PR from a natural inbound link, it will flow back into the site, even though the page itself will never be indexed.
We're not advocating any particular kind of non-indexable link, just showing the principle we want to apply.
What kind of links work? ... more needed here...
Just a side note: It's interesting that Google, by introducing us to the idea of PR sculpting, via their deceptive nofollow dead end, made some very savvy folks aware of the flaws in the PR model. And how these flaws make it obvious and necessary to hide stuff from Google in order to genuinely optimize for it.
Some of my peers are claiming that by just by using the nofollow attribute, your site is flagged as one influenced by an SEO, and will draw a higher level of scrutiny. Wow, that's profiling paranoia! But then again, ever since my experiments went up, my phone makes strange clicking sounds...