One of the common problems faced by enterprise seos is how to achieve situational awareness from the outside. That is, how do you discover the structure of the implementation if you don't have server access?
This is especially important when you work with very large sites. Often no one person even has all the details about all the subdomains. In fact, subdomains may be hosted in different server environments in different locations around the world.
If you're trying to understand how the enterprise is managing it all, there are some very simple tricks that can help you assemble the pieces. Here's a really smart technique using the search functions site: and inurl: The cool thing is that these use Google's own search results to reveal a site's hidden information. (Using these searches may trigger the Google warning on the right.)
Here's a simple iterative search routine that well uncover all the indexed subdomains of any given top level domain.
Start with this search:
This will show you all the urls without www - usually will show you https pages, and subdomains (www is actually a subdomain of every top level domain). You can easily find all subdomains by iterating through them. Pick a subdomain revealed by this search and then search like this:
site:domain.com -inurl:www -inurl:subdomain1
This will filter out all of both www and subdomain1, revealing other subdomains. By itertaing through all that you find, you end up with a search that gives no results. That search will show all the subdomains:
site:domain.com -inurl:www -inurl:subdomain1 -inurl:subdomain2 -inurl:subdomain3 ...
The limitation is the current 32 word search limit.
Try It With Google.com
When we tried to discover all the subdomains of Google, we had to stop at 32 (plus we got the warning and could do no further searches for a while):
site:google.com -inurl:www -inurl:adwords -inurl:knol -inurl:ditu -inurl:maps -inurl:local -inurl:translate -inurl:books -inurl:picasa -inurl:video -inurl:code -inurl:picasaweb -inurl:mail -inurl:chrome -inurl:ejabat -inurl:investor -inurl:wifi -inurl:labs -inurl:checkout -inurl:images -inurl:docs -inurl:photos -inurl:gears -inurl:pack -inurl:sites -inurl:documents -inurl:wave -inurl:afp -inurl:canadianpress -inurl:blogsearch -inurl:earth -inurl:answers
(limit was reached but we could still see:)
-inurl:research -inurl:trends -inurl:sitescontent -inurl:scholar -inurl:trends -inurl:toolbar -inurl:services -inurl:sketchup
Remember the "supplemental results" and all the issues surrounding their revelation and then their disappearance from the Google's search results? They're still around, and you can find them still with this secret handshake:
Here's a hack we uncovered long ago, when supplemental results markers were removed from the search results. So in a sense, this is even more important, because it lets you know whether your urls are in the main index or in the now hidden supplemental results.
We discovered a long time ago that by simply adding /* or /# to a site: search, the result set was drastically changed. Back when the supplemental results were flagged, we found that this search:
revealed all urls in the main or primary index. If you subtracted them from the results you get when you search:
the remainder were the urls in the supplemental results.
The image on this page is one we see a lot when we do these kinds of searches. Check our blog post on this. We strongly suspect we're being profiled because we see this warning often. Do you?