- The Search Agents - http://www.thesearchagents.com -

The Reality Check of Semantic Search

From the Web of Documents to a Web of “Things”

Besides almost having a heart attack after the recent WSJ post [1], along with other prominent figures in the search world (yes, I’m looking at you Danny Sullivan [2]), I was reminded of an incident over a vacation with some friends way back in high school. We had left a friend sleeping in a beach-side trailer who could not be brought into a waking state for our adventure to find a decent breakfast. Upon our return, we discovered that in his boredom, he had found a sharpie and package of post-it notes and had proceeded to label every single object in the trailer. Even the toilet paper was labeled, the actual sheets – not just the roll!

How does this nostalgic story relate to the WSJ article and Danny Sullivan’s reaction “Reality Check Time!”? While most of you have been sleeping at the wheel, many savvy webmasters have been slowly doing the same thing as my strangely amused high school friend. These web stars have been using a particular kind of markup to label every “thing” they can in their websites right under your nose. Now this isn’t the genesis of a boredom only webmasters and tech-geeks can know and understand. This labeling methodology arose as a response to the growing need for more relevancy and trust signals on pages. We’re talking about pages not just people can read, but that machines can decipher as well.

We’ll have to discuss the bigger picture and where this is heading in another post. Just know that this has major ramifications, we are moving from a beloved web of documents, towards a web of “things”. The implications are large, but the actual shift towards this new web, or “graph” as it’s now being called, happened a while ago, and very, very quietly.

So let’s get to the bottom of what this “markup” talk is all about.

Spoiler alert: Search Engines and Web Pages are very dumb!


First, I’ll let you in on a secret – the technology that runs search engines and web pages is very dumb. Meaning, when a page is marked up we can see a title, header, author, sidebar, and body text, which is no surprise. However, a machine has very little way of knowing that two words on a page are actually a name, which is why we have things like “keywords” in the first place. Machines are generally considered to be somewhat obtuse, but not for long. As humans we might recognize certain images as licensed, but machines lack that capability. HTML as a language isn’t very good at telling what the content is “about” – at the moment we have very simple elements for things like <p> for paragraph and <div> for a division on the page. The markup we use is essentially just syntax – a rudimentary way of structuring information on the page. So this whole big hoopla about “semantic markup” is really about delivering more signals to the machines that tell us what these elements are “about”. Machines will not be dumb for much longer as semantic markup becomes more widely adopted. This entire Chicken-Little scenario from the WSJ happened a long, long time ago in a galaxy far, far away in a corner of your basement. (Smacks head) Ahem, this happened, like, TEN years ago.

Brief History Lesson – from Microformats to Semantic Markup

This all began with nice little recipe patterns for HTML from the guys who brought us Microformats [4], patterns used to mark up an event or contact. Unfortunately, progress lagged as increasingly more recipes beyond the standard hcard and hcalendar formats were suggested to the group and unapproved. There was an obvious need for something more powerful.

Over on the sidelines, tech businesses were paying attention to things like XML markup, which allowed for very strict semantics. Instead of just having things like <div>, we now could have <recipe> and <ingredient> and <directions>. So what was the problem? Great, now we have 20 million ways recipes could be marked up by various people. The essential takeaway is that someone figured out how to take a subset of XML technology, called RDF, and figured out a way to tie it into the dumb stuff we were already using – XTHML.

Thus interest in something called RDFa [5] “Resource Description Format in Attributes” arrived on the scene, which presented a way to achieve the goal of making machines see what humans see. RDFa allowed for the tethering of “things” on a page into formal vocabularies called “Ontologies,” which are essentially structured descriptions of what a “thing” is “about.” For instance, we could mark up something as “mother” and another as “daughter” and the “meaning” would imply all sorts of unspoken structural or relational denotations – to avoid a “daughter” being the “mother” of herself.

Long story short, while you were sleeping, a bunch of really smart techy people created a bunch of ontologies for all kinds of things. And yes, this happened about 10 years ago. There has been a boom over the last few years, especially, from the Life Sciences and Medical areas. Even Wikipedia is getting in on all the fun.

The Crux of Semantic Search

And the crux of Semantic Search is that search engines started using these to enhance search results, which happened quite some time ago. If you’ve heard of Rich Snippets, you have at least some understanding of how Google is putting them to work. Recently Google, Bing, and Yahoo also came out with Schema.org [6], providing a nice little distillation of these ontologies into a basic set almost everyone could use: People, Organizations, Places, Media, and so on.

I’m not going to bore you with the controversies surrounding whether or not Schema.org is even a “well-formed” ontology, or if it properly “de-references” the URI’s, etc. The real controversy is that they have introduced yet another form of markup “Microdata [7]” as an alternative to RDFa, so there is also a new kid on the block. The thing to keep in mind is that there are basically two ways of currently marking up the data so that search engines can digest it: RDFa and Microdata – and Schema accepts BOTH.

Now which one you choose will depend on your particular CMS, which flavor of HTML you use. Ask your tech gurus if you don’t already know. So what is the big deal? Nothing has really changed drastically on how engines are using this, YET.

What you need to know:

  1. The choices you need to make when developing a website now will have implications down the road. XHTML vs. HTML5 is a decision that should not be taken lightly.
  2. Whether you choose RDFa or Microdata should also not be taken lightly. There’s controversy from a few different angles regarding when/if these will become actual W3C recommendations or not.
  3. Which ontologies you tie in will also have implications depending on whether they change over time. Even Google’s original vocabulary released about 2 years ago has been tampered with and now rebranded with other search engine backing as Schema.org. (Not to mention that Yahoo was the first proponent of this kind of markup – but who really talks about them anymore?)

Some expectations – Vertical Search and Semantic Search


Last year Google introduced a new verticalized and facetted search vertical for “Recipes” – which allows you not only to search and find recipes, but to also see user ratings, and even choose from the side-bar which ingredients to include/exclude. This is all stems from semantic markup being available to for the machines to consume – people like it too!


Other interesting things are happening in e-commerce sites, the major example being Best Buy, who marked up their products with RDFa using an ontology called Good Relations and got similar sexier search results. Now you can see a product and reviews, sometimes the price or the special offer price, and so on. This is right in the result, which means a higher click-through rate, thus more traffic, and (gasp) more sales. But again, that’s last year’s news.

Sorry, Danny – the real reality check we should be looking at?

The most important is point number one:

1.  It is no longer acceptable for marketers to NOT know about basic markup.  If you do not understand HTML at this point, you won’t even comprehend where this next level of markup is headed. And frankly, we’re tired of looking like Gallagher with a big mallet pointed at your brain.


2.  This new markup will only help you.
You should already be getting started on what requires mark up. ROI is not being left behind in the brave-new-web. We have been working on how to deploy much of these early entries with our own clients. However the basic SEO content and architecture recommendations should be at least partially complete to even begin, otherwise this effort could be wasted.

3.  Expect more small changes to continue rolling out slowly over the next couple of years. In other words not much has changed – yet, which means you have some time to actually learn a few things and implement. Quick tip – there’s more to semantic markup than just Rich Snippets!

So you can take off your panic-face now. See, it wasn’t so bad. What are you waiting for? Start marking up your pages!

About Brandon Schakola