URL Confusion? What’s that?
I’m glad you asked!
Well, Search Engines strive to rank the best possible search result using the cleanest URL. This means, if you provide them any URL-wiggle room (several duplicate versions of the same page all on unique URLs) they could choose one version of a URL over the others, you may not expect the one they choose, this will throw off your website’s ability to rank for important keywords, that in turn will adversely affect year-end traffic numbers, resulting in your receiving a huge lump of coal in your Christmas stocking.
Obviously, SEO in general is not that simple, but the challenges of URL Confusion sort of are. In a nutshell, any URL related SEO issue (more on this real soon) which is visible to Search Engines during their normal crawling activity increases URL Confusion.
In the TSA SEO Architecture group, our philosophy is to never leave anything up to the Search Engines to decide FOR YOU. Specifically, we’re talking about the many aspects of website design which are not reliant on 3rd party influence. This includes how cleanly you generate URLs, how you run your analytics, how you structure your content silos… and so on.
Does my website suffer from URL Confusion?
Search Engines may experience URL Confusion during their crawl of your website if you can answer YES to any of the following questions:
- My site doesn’t canonicalize www and non-www URL requests
- My site produces URLs with CaPs sometimes, but not always…
- My website uses appended parameters for affiliate tracking
- My CMS uses unfiltered product names to create our URLs
- www.example.com/dvd/The Dark Knight (Widescreen Single-Disc Edition) (2008)
- www.example.com/tvshow/Two and a Half Men: The Complete Sixth Season (2009)
- My website allows users to print and e-mail our pages
MARK! I ANSWERED YES TO ALL OF THEM! What Are My Options?
First off, let me just mention that the degree to which your website is experiencing URL Confusion depends on how many pages we’re talking about. If you have a relatively small website < 1000 pages, the chances that Google in particular is experiencing serious problems in finding and indexing your content solely due to URL Confusion is probably low. When it comes to issue severity, the volume of pages affected is often key.
Now, if your website resides in the thousands of pages range, watch out. Seemingly small issues, baked into your templates, could be replicated and increase in number as your website generates more and more new pages every single day.
GOT IT! Give Me Solutions!
When it comes to URL Confusion, oftentimes the most effective SEO strategy requires a combination of solutions. Which ones are optimal to address your particular issues can greatly depend on your available development resources (some solutions are more complicated than others) as well as your major SEO goals:
Here are a couple common SEO Goals:
- Maintaining Search Engine Experience
- Improving Crawl Rates
- Decreasing Duplicate Content
- Maintain SEO Value
- Consolidating Link Value
- Maintaining Link Value
By popular demand, the TSA Architecture team has come up with a simple grid to highlight many of the most effective ways to prevent URL Confusion issues, if the solution is currently effective and how they impact these common SEO Goals.
URL Confusion Solution Grid
* Usually, the best case scenario for eliminating URL Confusion is to prevent duplicate URLs from existing altogether, and this typically is best accomplished by 301 redirects
** MSN Announced support of Canonical Tag to begin at the end of 2009
*** Hash Marks are currently effective for preventing some forms of URL Confusion, but there are indications this may change without warning, and search engines have not specifically endorsed its use for solving these challenges. TSA does not officially support this option
Can I see an example of what you’re talking about?
- Robots.txt Blocking
- Directory /carnivalcruises/user/ blocked to Search Engines
So, let’s take a look at what we’ve done.
+ Conserved Crawl Rate
(Prevents Search Engines from Crawling Duplicate URLs)
(Print and E-mail features still available for Human Visitors)
– Existing link value lost / does not prevent future link value loss
Overall, these are very acceptable tradeoffs which should result in the Search Engines having more bandwidth to crawl other pages which we actually want them to find/index/rank, rather than these very low- value duplicate pages.
Of course, there is no silver bullet for all URL Confusion issues. In fact, as the grid notes, some solutions have “No Consensus” (as far as we’re concerned) on some SEO Goals. We placed “No Consensus” where we couldn’t agree that there was unanimous SEO community and internal TSA agreement on the particular solution and its impact.
In closing, I hope most people view this URL Confusion Grid not as THE silver bullet to fixing all URL issues… WAIT! I do want that!!!
NO, NO, focus Mark.
In closing, this grid should illustrate that there are many solutions to URL Confusion issues, that the path to fixing them will require you to think through your solution’s impact down the road (positive and negative) and highlight that many of these solutions can effectively complement each other.