SEO is the process of improving the quality and quantity of website traffic to a website or a web page from search engines.
SEO targets unpaid traffic rather than direct traffic or paid traffic.
If you’ve ever shopped for love or money online, you’ve seen faceted navigation.
This is often the list of clickable options, usually within the left panel, which will be wont to filter results by brand, price, color, etc.
Faceted navigation makes it possible to combine & match options in any combination the user wishes. It’s popular in large online stores
Because it allows the user to exactly drill right down to only the items they’re curious about.
But this will cause huge problems for search engines because it generates billions of useless near-duplicate pages.
This wastes the crawl budget, lowers the probabilities that each one of the important content will get indexed, and it gives the search engines the message that the location is usually low-quality junk pages (because, at now, it is).
Many articles mention faceted navigation and the way to mitigate the SEO problems that it causes.
Those are reactive strategies: the way to prevent the search engines from crawling and indexing the billions of pages your faceted navigation created.
Instead, it’s about the choices that make massive duplication and the way to avoid them from the beginning.
It’s about the seemingly innocuous UX choices and their unintended consequences.
My goal is to offer you a deeper understanding of how each decision affects crawlability and final page counts.
I’m hoping this may offer you the knowledge you’ll use, both to avoid problems before they begin and to mitigate problems that can’t avoided.
Faceted navigation usually divided into groups, with an inventory of clickable options in each group.
There could one group for brand names, another for sizes, another for colors, etc. the choices during a group often combined in any of a couple of different ways:
“AND” matching — With this match type, the shop only shows an item if it matches all of the chosen options.
“AND” matching is most frequently used for product features where it’s assumed the consumer is trying to find a selected combination of features and is merely curious about a product if it’s all of them. (e.g., headphones that are both wireless and noise-canceling)
“OR” matching — With this match type, the shop shows items that match any of the chosen options.
This will be used for lists of brand name names, sizes, colors, price ranges, and lots of other things. the idea here is that the user is curious about a couple of various things, and needs to ascertain a combined list that has all of them. (e.g., all ski hats available in red, pink, or yellow).
“Radio button” matching — With this match type, just one option could also be selected at a time.
Selecting one option deselects all others. The idea here is that the choices are 100% mutually exclusive, and no one would have an interest in seeing quite one among them at a time.
Radio buttons are often wont to set sort order. it’s also sometimes wont to choose from mutually exclusive categories. (e.g., specifying the smartphone brand/model when buying phone cases)
Some radio button implementations require a minimum of one selected option (e.g., for sort order), et al. don’t (e.g., for categories).
The options within a given group are often combined using anybody of those match types, but the groups themselves are nearly always combined using
“AND” matching. For instance, if you decide on red and green from the “colors” group, and you decide on XL and XXL from the “sizes” group, then you’ll get an inventory of each item that’s both one among those two colors and one among those two sizes.
A typical real-world website will have several groups using different match types, with many options between them.
the entire number of combinations can get quite large:
The above example has just over 17 billion possible combinations.
Note that the entire number of actual pages is going to be much larger than this because the results from some combinations are going to be split across many pages.
For faceted navigation, page counts are ultimately determined by three main things:
The total number of possible combinations of options — within the simplest case (with only “AND” & “OR” matching, and no blocking)
the amount of combinations is going to be 2n, where n is the number of options. for instance, if you’ve got 12 options, then there’ll be 212, or 4,096 possible combinations.
This gets a touch more complicated when a number of the groups are radio buttons, and it gets tons more complicated once you start blocking things.
The number of matching items found for a given combination — the number of matching items is decided by many factors, including match type, the entire number of products,
the fraction of products matched by each filter option, and therefore the amount of overlap between options.
The maximum number of things to be displayed per page — this is often an arbitrary choice set by the location designer. you’ll set this to any number you would like. a much bigger number means fewer pages but more clutter on each of them.
The choice of match type affects the page count by influencing both the number of combinations of options and also the amount of matching items per combination.
All of the numeric leads to this text were generated by a simulation script written for this purpose.
This script works by modeling the location as a multi-dimensional histogram, which is then repeatedly scaled and re-combined with itself whenever a replacement faceted option is added to the simulated site.
The script simulates gigantic sites with many groups of various option types relatively quickly. (For previous articles, I even have always generated crawl data using an actual crawler, running on a test website made from real HTML pages.
That works fine when there are a couple of tens of thousands of pages, but a number of the tests for this text have trillions of pages. that might take my crawler longer than all of recorded human history to crawl. Civilizations rise and go over centuries. I decided to not wait that long.)
Suppose we have a site with the subsequent properties:
The faceted nav consists of 1 big group, with 32 filtering options that will selected in any combination.
There are 10,000 products.
On average, each filtering option matches 20% of the products.
The site displays (up to) 10 products per page.
Options combined using “AND” matching.
4,294,967,296 different combinations of options
4,294,724,471 empty results.
The obvious: the amount of pages is gigantic, and therefore the overwhelming majority of them are empty results. for every 12,625 pages on this site, one shows actual products. the remainder shows the annoying
“Zero items found” message. This is often a terrible user experience and a huge waste of the crawl budget. But it’s also a chance.
So what can we do about all those empty results?
If you’re on top of the server-side code, you’ll remove them. Any option that might cause a page that says
“Zero items found” should either grayed out (and not coded as a link) or, better yet, removed entirely.
This must evaluated on the server-side whenever a replacement page requested.
If this often done correctly, then whenever the user clicks on a choice, all of the remaining options that might have led to an empty result will disappear.
This reduces the number of pages, and it also dramatically improves the user experience. The user now has to stumble through a maze of mostly dead ends to seek out the rare combinations that show products.
So let’s do this.
This test is just like Test #1, except now all links that cause empty results silently removed.
This time, we get:
1,149,017 (reachable) combinations of options.
0 empty results. (obviously, because we’ve removed them)
This may still appear to be tons, but it’s a big improvement over the previous test.
The page count has gone from billions right down to just over a million. This is often also a way better experience for the users, as they’re going to not see any useless options that return zero results. Any site that has faceted should be doing this by default.
This test uses equivalent parameters as Test #1, except it uses “OR” matching:
The faceted still has 32 filtering options
There are still 10,000 products.
Each filtering option still matches 20% of the products.
The site still displays 10 products per page.
Options now combined using “OR” matching rather than “AND” matching.
This gives us:
4,294,967,296 different combinations of options.
4,148,637,734,396 pages (!)
0 empty results.
The number of combinations is precisely an equivalent, but the amount of pages is far higher now (966 times higher), and there are no longer empty results.
Why is the page count so high?
Because, with “OR” matching, whenever you click on a replacement option the amount of matching items increases. this is often the other of “AND” matching,
where the amount decreases. During this test, most combinations now include most of the products on the site.
In Test #1, most combinations produced empty results.
There are not any empty results in the least during this new site. The sole way there might be an empty result would be if you chose to incorporate a filtering option that never matches anything (which would be quite pointless).
The strategy of blocking empty results doesn’t affect this match type.
This test uses radio button matching.
If we repeat Test #1, but with radio button matching, we get:
33 different combinations of options.
0 empty results.
This is outrageously more efficient than any of the others.
The takeaway: Always think about using radio button matching once you can escape with it (any time the choices are mutually exclusive). it’ll have a dramatic effect on page counts.
Faceted navigation is one of the thorniest SEO challenges large sites face. Don’t wait to deal with issues after you’ve built your site. Plan ahead. Use robots.txt, check out selection options, and “think” sort of a program.
A little planning can improve the use of the crawl budget, boost SEO, and improve the user experience.