We recently made a massive URL change at Geni where millions of profile URLs were changed from this format: http://www.geni.com/genealogy/people/NAME-SLUG/PROFILE-ID to this format: http://www.geni.com/people/NAME-SLUG/PROFILE-ID We had myriad issues to work through, including:
- Antiquated code that wasn’t properly 301ing URLs if a cookie wasn’t set (which meant we were accidentally cloaking the /genealogy URLs as 200 OK to most bots, but showing a 301 to anything that could set a cookie)
- Caching issues (memcache requires a lot of resources when scaling over tens of millions of URLs, and yes, we live and die by the belief that speed is important)
- Resources for reporting and testing (I’m in my fifth week, we’re working on it!)
- Indexed pages were decreasing significantly in GWT
- We hadn’t updated our directory or our XML Sitemaps to reflect the URL change yet (we wanted Google to see as many of the redirects as possible to speed up the migration process)
- Recent changes to robots.txt had significantly increased crawling on our family tree pages, which aren’t really optimized (and kinda slow for users but not bots, hi Flash).
- We made significant changes to how we tag URLs in Google Analytics (this made it very difficult to see what was happening because we didn’t have an easy way to compare data – I know, slow down George, right?)