CEO Blog Series: Chapter 3: Visual Similarity

Let’s say we have a DAM with 100,000 pictures in it, all kinds of pictures. We are doing a marketing piece and we need a beach scene. Unfortunately, most of our pictures have no caption information, and we can only find one beach scene by using a text search, and it just is not quite right. We are sure we have something better in our collection, maybe even a beach at sunset, but how are we going to find it?

What if we had an AI engine that was pretty good at recognizing the contents of a photo? We could run all our 100,000 photos though it, and for each one it would give us a “descriptor”, the set of numbers it generates to characterize what is in the photo. We then build the backend, loading each of the 100,000 descriptors into our space, ending up with a bunch of dots floating around in there.

Then, when we need to see if we have any other beach pictures, it is easy: we look up the descriptor we got for the one photo we could find, we find it in our block of space, and then we see what other dots are really, really near it (within a small radius sphere of our beach dot). We then retrieve the images corresponding to those dots, and sort them in order of distance, so “nearest neighbors” first.

Pretty simple, right? If we did a good job making the front end engine, then odds are really good that the top images returned are beach scenes, and we find the perfect one, with a sunset.

The important thing to note here is this: we NEVER looked at any text to find the better beach scene, this search was totally based on image content. For the first time since DAM became a thing, we can now do searches totally from the actual content of an image, and not at all dependent on whether there is textual metadata or not, or if the caption used just the right word we searched for, or any of the other shortcomings of searching with DAM up till now. We were able to say to our DAM: “Hey, show me images that look like this!” and it came through.

Let’s next move on to something similar: facial recognition!