The null set redundancy is an issue created by an automation
scheme that has not been properly vetted for search compliance. The system is creating pages that are redundant by virtue of a lack of content.
Typically, it's a product page that keeps getting generated long after it sold out or was discontinued, etc. But there are an infinite number of ways a null set redundancy could have been automated into an implementation, especially if the principals have no knowledge of search compliance.
And here's the thing about automation: Any issue that's created by automation is probably a disaster lying in wait. It's just a matter of how big a disaster. On a complex system, just trying to locate the cause can be daunting.
It turns out that the typical null set redundancy is so common that the fixes are pretty well known. Very many businesses, both large and small have this problem right now. And for the time being, they're flying under Google's radar, so no one even realizes there's such a thing as a null set redundancy problem.
But there is a problem. And it's a problem that can shut the enterprise down if it's dependent on the natural search for traffic and sales.
One of the reasons Google appears to be so arbitrary when it penalizes sites, is that it takes them a while to find the issues. And there are clearly some tolerances built in, so as long as you have less than x number of redundancies that Google can see, and it's below the threshold in terms of relative size, you're likely not to have the trigger pulled on you. Small sites, even with null set redundancy issues will often dodge this bullet and may continue to forever. But any enterprise that does not preemptively address this issue is negligent.
One of our experiments was intended to discover where the red line was for content redundancies in a general way. We took a one page website and duplicated the homepage content on a file called 1.html. After 3 months, another homepage dupe was created called 2.html. Every 3 months another dupe was added, and Google never penalized the site until there were 10 exact copies of the homepage. Another site with 10 pages of unique content was given a new page that duplicated the homepage every 3 months. That site NEVER was penalized. The pages were indexed and given supplemental status (which is a kind of penalty in our view), but the site continued to rank for its main terms.
While we were able to fly under the radar, you really don't want to test these waters with a live commerce site - there should be absolutely NO duplicate pages, and it's critical to know for a fact that this is the case.
But when automation is behind the issue, it is very easy to cross that forgiveness threshold and not realize it is happening until your ranks are gone, especially when scale is involved. And a null set redundancy is one of those issues that is very easy to overlook if you are not aware of how they occur.