Search
Compliance
Friday 19 April 2024 04:41 PM   Your IP: 3.137.187.233
Structural SEO
Home       SEO Enterprise Blog       Search Compliance       Structural SEO       The Semantic Imperative       About re1y.com      
Home
Restoring Ranks Post Panda
When Google Destroys Your Business
Search Due Diligence For Internet Investments
SEO Enterprise Blog
Enterprise SEO
Negative SEO
The Risks of Relying on Search
Rank Preservation
When SEO Fails
Search Compliance
Google Penalty Solutions
The Ethics Of Search
Structural SEO
Multiple Sites
Defensive Strategies
Inbound Links
Link Vetting
Third Party Interference
Filename Masking
Black Hat Reality
Recourse & SEO
The Null Set Redundancy
The Semantic Imperative
In The Name Of Relevance?
Automation And SEO
PageRank
Content Authority
Google Penalties Insight
Link Authority Trainwreck
robots.txt
Paid Links
Securing robots.txt
Foreign Language Sites
nofollow
RDF / RDFa
Replacing Nofollow
Canonical Condom
Granularity In CMS
Evaluating SEO Agencies
Search Forensics: Subdomains & Supplemental Results
Google Hiding Link Metrics Behind Sample Links
Enterprise Link Building
Link Velocity Debunked
New Link Disavow Tool
Turn Old Product Pages Into Link Bait
8459

An Enterprise View Of robots.txt

Anyone reading this site is assumed to know the function of robots.txt. A simple search for it will give you the background. Here we discuss enterprise application.

This should not be a startling fact: Most enterprise sites are only search compliant if their robots.txt file is operational. When you manage a large document set with sophisticated navigation and directory structure, certain redundancies improve the visitor experience (or the collection of visitor data), creating a need to address search compliance. It's virtually impossible to manage an authority site without the need for a sophisticated robots.txt strategy.

Basically, this disallow instruction set holds the enterprise compliant by preventing the indexing of redundancies (and other non-compliant areas), and a failure in form or syntax would very likely result in a Google penalty.

Some persistent robots.txt myths

- You can only disallow directories. If you believe this, you need to bone up, big time.

- Can protect from bad inbound links. Bad inbounds are evaluated at the source. If it's pointing at your page and that page does not show 404, you get the credit, regardless of robots.txt.

- Can protect you from discovery of ownership issues. robots.txt only deal with INDEXING. But if you look at your visitor logs, you are very likely to see a Mozilla Googlebot inside files you had disallowed. They're not getting indexed, you're getting looked at carefully for compliance. So you can't hide connections to other sites exposed in scripts, ips, email addresses, or anything with robots.txt (or nofollow).

- Never need more than one. The fact is, you probably need at least 2. Google says, "Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols."

- Only used for disallows. There are many other very important reasons for using a more sophisticated robots.txt file on a large system. For one, using the 'allow' command to specify permission, or exceptions to a previous general disallow. Another is to keep the bots out of directories that play no role in ranking the site. That cuts the use of server resources, so on a large system, disallowing everything that doesn't support rank can be a very smart move from an efficiency point of view - aside from the compliance or privacy concerns.

Here's the quick reference on robots.txt blocking functionality:

- blocks the page from indexing based on internal links
- disallowed pages still accrue PR
- pages are not crawled and so do not pass PR
- but pages may appear in the search results if external links point to them (but they will not show a cached link)

And then there's the fact that the protocols have been evolving. Recently robots.txt was enabled to point robots directly to your sitemap, using a 'sitemap: url' syntax:

Sitemap: http://www.domain.com/sitemap.html

There's a lot more on the enterprise use of robots.txt coming on this site. Tell us your war stories.

Home       SEO Enterprise Blog       Search Compliance       Structural SEO       The Semantic Imperative       About re1y.com      

re1y.com
Enterprise SEO
Google Penalty Solutions
Automation & Search Compliance

Looking for SEO enabled content management systems with structural, semantic optimization built into the cms? You're on the right site. Research identified targets are implemented within the markup, content, and filenames to enable the site to rank as high as possible based upon semantic relevance. 34789366G off site content requirements