Colons and Organic Sitelink Title Text: What is Google Doing?

Posted By David Waterman On August 31, 2011

In researching Google’s sitelink Title text methodology [1], I discovered something a bit odd specific to how Google uses colons for sitelink Title text creation.

NOTE: the term “colon” will be used quite often in this post and is in reference to the punctuation mark. So get your mind out of the gutter!


At first glance, regardless of whether Google pulls the sitelink Title text from the Title tag, on-page content or anchor text; it appeared Google might not pull any additional content after a colon when creating sitelink Title text. However, there ARE instances where Google is actually including a colon in the sitelink Title text for a website where a colon is not used in the specific text within the site, and then including the content after it.

Confused yet?

As always, the best way to show this is through a few examples. Come with me as we examine Google sitelink Title text creation and colons (stop laughing!).

Example 1a: No Content Pulled after Colon (PDF)

In the example below, we see a sitelink to a PDF (Google Instant). Since this is a PDF, it isn’t possible to have a fixed Title tag, but Google is obviously pulling something to create a Title for it.


The on-page Title of this PDF is “Google Instant: Potential Impact on SEM and SEO”.


Anchor Text: This doc is linked to internally and externally using the following text:

Internally: “Google Instant: Potential Impact on SEM and SEO” (http://www.thesearchagency.com/whitepapers/ [4] )

Externally: “Google Instant: Potential Impact on SEM and SEO” (http://www.thesearchagents.com/2010/09/new-white-paper-analyzes-impact-of-google-instant-on-sem-and-seo/_ [5]

“Google Instant: Potential Impact on SEM and SEO” is the primary anchor text used to link to the PDF as well as the on-page title; so, the assumption is Google is pulling the anchor text to use for the sitelink Title text, but only pulling the content up to the colon and ignoring the rest. An argument could be made that they are referencing the content within the PDF to create the sitelink Title text, but in either case, they’re not pulling the text after the colon.

Example 1b: No Content Pulled after Colon Usage (web page)

Here’s another example where we see the colon being used as a sort of stop word.

In the Fandango example below, we see a sitelink for the X-Men: First Class movie:


The link in the sitelink goes to the X-Men: First Class page BUT only shows the term X-Men. There are several X-Men movies, so why is Google not including First Class?

Let’s take a look at the page components 

URL: http://www.fandango.com/xmen:firstclass_133869/movieoverview

Title Tag:




Anchor Text:

This page is linked to internally AND externally using the following text:

Internally: ”X-Men: First Class


Externally: ”X-Men: First Class


There definitely appears to be consistency in colon usage for on-page content and internal/external linking for this page. It’s unclear if Google is using the anchor text, Title tag, or on-page content to create the sitelink Title text; but regardless, it seems they’re stopping at the colon and not pulling anything else (ok seriously, stop the laughing!).

HOWEVER, this isn’t consistent. I discovered an instance where they’re adding a colon where a colon wasn’t originally used and then including content after it.

Example 2: No Colon Used BUT Google Added it AND Included Post-colon Content

In the example below, we see a sitelink result for IMDb. You’ll notice that their X-Men sitelink Title text also doesn’t have First Class in the Title text; however, their Battle: Los Angeles sitelink Title text has a colon AND the text after it.


When I took a closer look at IMDb, I noticed they do use a colon on their X-Men: First Class page but they do NOT use a colon on their Battle: Los Angeles page.

Let’s take a look at the page components of their X-Men: First Class page:

URL: http://www.imdb.com/title/tt1270798/

Title Tag




Anchor Text:

This page is linked to internally AND externally using the following text:

Internally:  ”X-Men: First Class


Externally:  ”X-Men: First Class


Externally:  ”X-Men First Class


On-page, they use X-Men: First Class. When looking at the backlink anchor text, the majority of the text used is X-Men: First Class, but there were a few instances where the colon was not used. Regardless of where Google is pulling the sitelink Title text, they’re acknowledging the colon and ignoring the content after it.

Now let’s take a look at the Battle: Los Angeles page components:

URL: http://www.imdb.com/title/tt1217613/

Title Tag




Anchor Text:

This page is linked to internally and externally using the following text:

Internally:  ”Battle Los Angeles


Externally:  ”Battle: Los Angeles


Externally:  ”Battle Los Angeles


IMDb does not use the colon anywhere within their site in reference to Battle: Los Angeles. It looks like backlink anchor text usage is mixed so some use the colon and some do not.

So why is Google choosing to ignore content after the colon in X-Men: First Class BUT adding the colon in Battle: Los Angeles when IMDb does NOT use it within their site AND choosing to include the content after the colon? Google isn’t adding the colon in the Title of the page’s organic listing.


Random Guess: Google looks to Wikipedia (or other official websites) for official titles to use in sitelink Title text

Both the IMDb X-Men: First Class and Battle: Los Angeles pages are linked to from Wikipedia; however, the difference is the Wikipedia page uses a colon in referencing the movie Battle: Los Angeles and when linking to IMDb.



Also, the official website for the movie (http://www.battlela.com/site/ [17] ) and the Sony Pictures website (http://www.sonypictures.com/homevideo/battlela/ [18]) refer to the movie as “Battle: Los Angeles” (with the colon).  So it’s safe to say the official title structure of the movie is Battle: Los Angeles.

So what the hell is Google doing????



When Google sees a colon used within text it wants to use for a sitelink Title, does the algo assume the colon is separating two unique and different statements within the text and ignore the content after the colon so Google can create a clean sitelink Title that’s somewhat clear and within 30 characters?

Does Google crosscheck official titles of movies with Wikipedia and/or the movie studios to create sitelink Titles? If so, why don’t they do it for related organic listings?

When Google forces the usage of a colon in sitelink Title text, is there a glitch in the algo that inadvertently includes the text after it when it should still ignore it? OR, is the glitch that they are assuming too much with colons by not pulling the text after a colon when actually they should (as is the case with the X-Men examples)?

What if IMDb did NOT use a colon in X-Men: First Class on their site? Would Google force the usage of the colon and then include First Class?

Regardless of the answers, these examples further support the theory that Google doesn’t use one specific page component to create sitelink Title text AND creates the text in a couple different ways. I think the best thing you can do at this point is stay consistent with on-page content and internal/external linking, and maybe use your colons lightly (OK, go ahead and laugh at that one).

