- The Search Agents - http://www.thesearchagents.com -

Image Search Secrets, Part 2

PageRank for Images

Ranking images in image search would be much easier if computers could interpret what exactly is contained in an image. Currently, a “semantic gap” exists between descriptions generated by computer vision algorithms, and the actual content of the image.

Instead of trying to detect what the image contains, another approach for ranking is to compare two images and see if there are similar “features” among them. This section discusses some approaches presented in two Google papers: “PageRank for Product Image Search”, by Y. Jing and S. Baluja, and “Canonical Image Selection from the Web” by Y. Jing, S. Baluja, and H. Rowley.

The idea behind the “PageRank for images” or “Image Rank” approach is that many images in a result set for a query are expected to have similar features. We could “link” these images together based on their similar features, and compute a rank based on this visual link structure (an image similarity graph). This approach is similar to PageRank in which we have web pages and hyperlinks forming a link-graph rather than images and similarity features forming an image similarity graph. The importance of an image is contributed by the images sharing similar features with it.

The Image Rank computation algorithm based on the Google paper is described in the following steps.


Similarity Graph Visualization for “Mona Lisa” from the Google Paper

We can express the computation of “Image Rank” as follows:




This PageRank-like approach for images works well in the case of popular queries, with a large number of similar images. The downside of this approach is that a large amount of computation is required and there has to be a large volume of images in the image data collection.

An interesting by-product of this approach is the option to search for “similar images”, which we are effectively gathering in order to calculate the image rank.