Problem: Your hospital had a fund raiser last night, and your photo team took 1000 photos. This morning your CEO calls and says they need the best photos of your top surgeon with each of three big donors.
There are no captions on your photos. The deadline is in 30 minutes. You cannot possibly look through all 1000 photos in 30 minutes.
If your DAM could do facial recognition, you could get this done in seconds. How do you build facial recognition into DAM?
Like many things in life, when you find yourself looking at a big challenge, it helps to break it into smaller pieces. In this case we can identify three things we need to do in order to make facial recognition work in a DAM:
- First we need to see if there are any faces in each photo: we need a way to look at photos and locate faces and put a “bounding box” around them.
- Next, once we find a face, we need some way to identify other photos with the same face.
- Finally, we need to tweak these two steps to make them really useful
Let’s start with the first problem: here is a photo. Are there any faces in it?
It turns out there has been a lot of research around finding faces in photographs, and approaches range from the inelegantly named HOG (Histogram of Oriented Gradients), up to more advanced algorithms that detect facial landmarks.
Depending on the approach, you get really good results spotting faces even in really difficult scenes:
More on this later…..
Once we find a face we crop it out from the base image, generate its descriptor, and put its dot into our multi-dimensional space. Sound familiar?
OK, on to the second problem: we found these faces, how do we identify if they appear in other photos?
Turns out we previously solved that problem, with Visual Similarity! But in this case we just extract the face of a person in one photo, identify their name, and ask the system to show us the nearest neighbors of the dot that face represents. Odds are good those nearest neighbors (if we pick an appropriate radius from our “known” face’s dot) are the same person, just like we did with Visual Similarity. The main difference here is instead of looking for similarity of the entire photo, we just look for similarity of our cropped face!
Then a well-designed UI would present us with our “known” image and all the candidates it thinks are matches, and we get to rapidly decide if the system is right or not (two people can look very similar but be different persons).
Now our third problem: do we need to tweak the system? We sure do.
As one example, is it possible these results spotting faces are TOO good?
It sure is. At some point a face in a photo can either be too small or too fuzzy to identify reliably, even if it can be found as a face, like in the examples above. So before we move on to identifying the person, we need to set some thresholds and decide if a face is too small or too fuzzy, in which case we need to just ignore it! If we cannot accurately identify a face, we are just adding bad data into a DAM, and we sure do not want to do that!
A second example which we will deal with later is also really important: can some of these AI models have bias built in? If so, why and how do we correct for that?
And back to our scenario: can we get great results in seconds instead of minutes or hours? We sure can. If you could sit down at your DAM with the 1000 photos shot last night, and find one with your star surgeon, just click on their face and type in their name and the system will show you other faces it believes are identical. Review and accept them (takes seconds). Then find a single image of each of the top 3 donors, and click on their faces, type in their names, and review the choices. Then just do a text search for your surgeon and Donor 1 and pick the best photo. Repeat for Donor 2 and Donor 3 with the surgeon, and your job is done. In seconds.
And that is the real-world value of a facial recognition system in DAM.
Fun fact: we found over very large picture collections there is an average of 3 useful faces per image!