How do you rate the accuracy of a search engine? Information retrieval metrics called “precision” and “recall” are used – so what the heck are precision and recall?
In their simplest forms, precision is a measure of exactness and recall is a measure of completeness. The metrics of precision and recall can actually be quantified with formulas.
Precision equals the number of relevant documents retrieved divided by the total number of documents retrieved.
For precision, a perfect score of 1.0 means that every result that was retrieved by a search was relevant. Unfortunately, it says nothing about whether all of the relevant documents were found. For example, if you have 1000 objects, and 200 are relevant, but your search engine only returns 1 hit, you would still get a perfect score for precision if that 1 hit happens to be relevant. But is that useful? You missed the other 199 relevant objects!
If your head is starting to hurt, hold on – it gets a little more complicated with recall.
Recall equals the number of relevant documents retrieved divided by the total number of relevant documents in the entire dataset.
But how do you know the total number of relevant documents? In our example, you could either go through all 1000 documents one-at-a-time, or you could take a random sample, and count how many of those are relevant, and scale that up to estimate how many documents from the whole batch might be relevant. That is called “sampling” (for more information check out the data sampling blog by Rande Simpson, posted on August 24, 2010).
To continue explaining recall in its basic form, a perfect score of 1.0 means that all of the relevant documents were retrieved by the search. The drawback here is that it says nothing about how many irrelevant documents were also retrieved, so in our example if a search returned 500 objects, even if 300 of them have NOTHING to do with our search we would still get a perfect score for recall, as long as all 200 relevant objects were there too. So “aceing” recall by itself is not a great thing either.
In doing some reading on the subject, I stumbled upon the book “Information Architecture for the World Wide Web” by Louis Rosenfeld and Peter Morville. From their book came a very good analogy. High recall could be equated to fishermen using the technique of “drift-netting” – the results will be pretty inclusive, but there will be a lot of stuff in the net that’s not important to you. On the other hand, high precision equates more to “lobster trapping” – a whole lot less will be collected at the end of the day, and you may not get ALL the lobsters out there, but you can be sure that what you have is likely to be a lobster.
With these search metrics, there is often an inverse relationship between precision and recall, where it is possible to increase one at the cost of reducing the other. A good digital asset management system needs to deliver results with a perfect blend of precision and recall. In a future blog we will discuss the trade-offs of both.
Posted by James Burke