by Bob Sakayama
Updated 11 July 2010
Big Sites = Big Pain
Loss of rank due to a Google penalty is not an uncommon experience for any size site. And whether it's due to redundancies, bad neighborhoods, multiple sites, bad links, server issues, dns mis-pointing, redirection schemes run afoul, other owned sites getting inappropriately exposed, subdomains with inappropriate navigation sharing or whatever, pain is pain, regardless of business size.
But Google penalties often scale along with other factors, so if there's a problem with the natural search ranks of a large web entity, chances are it's not a small one. Many troublesome Google penalty issues arise because automation protocols did not take search compliance into consideration. Systems are designed for the convenience of the users. But if search compliance was not considered during development, the automation itself can be the culprit, and for certain penalties, it's the first place to look.
Some of the most common Google penalty related automation issues involve the inadvertent creation of redundancies at the content and domain level, creation of multiple urls displaying the same content, and filename redundancies across multiple directories.
One common automation driven rank/penalty issue is the inadvertent creation of multiple urls all displaying the same content. For example, if you automate url tagging to identify affiliates, users, etc., and those various urls end up getting indexed, you can end up creating a massive redundancy issue. A very simple solution for this problem is to automate the use of the link canonical tag on every url. The canonical tag also removes the liability from https bleed - where your site gets indexed both as https and http. Also prevents confusion from www and non-www versions of the site getting indexed in conflict.
The null set redundancy exists when a page gets created even when there is no data, like when a product runs out of inventory and the same message is displayed for every product on a unique url. In this instance, you have pages that are redundant by virtue of having no data - perhaps only header, footer, and nav. Create enough of these pages (automation is good at this), and it can trigger a Google penalty because it can appear to be a deceptive strategy by virtue of the numbers. The solution to this kind of non-compliance is simple - establish conditions under which pages with no content either do not show, or better, render unique content. Instead of out of inventory product pages displaying the same message, create a variable driven messages that simply explain that the particular product is out of inventory. By adding that message to the regular product description, none of these urls will be redundant.
The solution to a domain level Google penalty can range from isolation, dns repointing, remediation of redirection schemes, correcting automation of .htaccess, isapi, or other mod rewrite configuration instructions to remediating the server environment.
But most domain level penalties are remediated by addressing what we call "ownership" issues. Think of it this way: If an owner of multiple sites inappropriately interlinks them, and/or uses them to push ranks or advantage other owned sites in the search, that owner's sites are at risk of being penalized. But they're only at risk if they can be identified as owned by the same party. This is a very easy line in the sand to cross, especially if you don't know it's there. So the simple solution is to keep every site independent of all other owned sites in every way possible - serve all assets locally, don't share links, css files, email responders, 3rd party functionality, image servers, databases, etc. with all your sites.
The link canonical tag mentioned above, is one of the best inoculators against a whole host of inadvertent errors and omissions. We highly recommend automating it on every url. We've seen sites return to the search based only on that one change. Many sites have homepages that do not rank for unique snippets because there are a huge number of different url versions of that homepage indexed that it's suppressed. Again, the solution is the canonical tag, at least on the homepage in this case.
Sometimes the implementation choices themselves lead to Google penalty issues. We know that Google can have problems with dynamic sites if the developers chose to mask the filenames without including the extensions. We have many examples of sites going intermittent with their ranks once they reach a threshold size. This is one case where critical mass works against big sites. Since 2010, we have not seen this issue, so we're hopeful that Google has either granted forgiveness of fixed their penalty hair triggers. But we keep it here because we know we only see a small sample of the penalties imposed on large sites, and it may still be out there intermittently harming the ranks of sites.
By now if you don't know that Google doesn't want us to buy links or sell links, then you live on Mars. And linking to (and appear to be supporting) a known bad actor can also trigger a suppression. So the reality is that both outbound and inbound linking can become a liability, something anathema to the principles of the ideal internet. But if you conduct business via the search, you have to comply with the rules set by the big G, at least for now, or you have to know how to manage that risk.
And then there's the reality set by both the competition and the past. Very large numbers of successful sites have scary skeletons in their closets in the form of legacy links. Some of these were paid, permanent links perhaps acquired by a previous seo agency. These may have been actually pushing your ranks and have been flying under the Google radar for years. And then there are those that have been discovered and reported by your competition.
And it's not just inbounds. Those (now deprecated) reciprocal links have been the source of problems. We've seen sights get penalized because they swapped links with a site that morphed (probably intentionally) into a malware station. Even those links innocently swapped can turn into a penalty trigger.
All links need to be seen in the light of potential liability. Our recommendation is strict adherence to some very simple rules that not only provide protection, but also advance the seo. First rule is to discard your reciprocal link structure. Second rule is to always use the nofollow attribute on all untrusted outbound links. Thirdly, never point home. This last one is the way you protect your enterprise. If you point links home and get penalized, you have to remove them to recover, and if you can't remove them, your domain is dead. If you point them to a landing page, it's better seo targeting, plus, if the links go bad, you can simply rename the file to get rid of the links. Plus, the penalty is most likely to be restricted to the contribution to rank made by the url with the link, not to the whole site, like it would be if the homepage is involved.
Over the past 5 years, the penalties we've handled have morphed from simple structural problems to much more mysterious link related issues. Unfortunately for us, Google just made it harder to discover link issues on large sites by limiting access to the data to a small sample. And while the information in Webmaster Tools is improving overall, Google tends to withdraw information that proves rank harm. We once could see if a url was stuck in the supplemental results - that information was right there in the search. Now we can't run discovery to identify harmful links. For Google penalties, we can't even get a confirmation, simple yes or no, that a site is penalized. So the trend is definitely not toward transparency for sites penalized in Google. And the solutions to Google penalties have become more arcane. Not a good thing.