We live in a privileged time, where capabilities undreamed of years ago are now routine. Those capabilities did not just happen overnight, they took constant innovation and hard work over an extended period of time, always searching for the next great advance to help people do their jobs. At MerlinOne that mission of constant invention and innovation is a core part of the culture.
Let’s have some fun with the roadmap!
The Early Days
Putting images to work in the mid 80’s meant editing stacks of prints or trays full of slides. If you thought the print was too dark, you sent the negative back to the chemical darkroom. Don’t like the distracting background for a picture? Call the artist with the touch-up airbrush. In either case go have some coffee and come back in an hour (maybe).
When images could finally be digitized (we invented a system called Phoenix that was the first PC-based negative or slide scanning and toning application) files were large (8 megapixel) and transmission lines were slow (15,000 times slower than my cellphone today). The computers used as “picture desks” were initially the size of a small room, and just moving photos around took minutes apiece. Even functions we consider basic today like rotating an image took more than eight minutes.
One of the first steps forward was JPEG, which became a standard way to compress images to ~1/15th their full size (and indeed, back then there were about 10 different flavors of JPEG. At about the same time, the industry leading the way in digital photography, the news industry, came up with a standard for embedding text information like captions (we call it “metadata” today) inside a JPEG file.
Another needed step was the creation of fast processors (this trend will repeat later), in particular, the Intel 486. Before then it just was not worth working with images digitally: while things were possible, they took forever (and even with a 486 you needed a separate board to handle JPEG). But the combination of new software (JPEG) and new processors showed us digital imaging was actually possible.
Completing the puzzle were early databases. In our case, it a very high-performance one used by the US military called FoxPro. Back then a database was not something that lived in the back on a server someplace: if your desktop wanted to do a search, it had to drag the entire database over a network (1/1000 the speed of today’s networks) to your desktop first. But at least it was possible for you to query a database from your desktop, a revolutionary capability!
Last but not least, back then databases could not possibly search text. We found a developer who had just created one of the first “free-text search engines” and that let users search captions for the first time. And so finally the pieces came together, and in 1993 the very first Merlins went into the field, helping people in multiple countries put their newspapers out with digital imagery!
The mid to late 90s saw rapid advances in computer science, and as always we were on the lookout for each advance and how to best serve our users by innovating and adding them in. The first big change was to stop having to pull an entire database to each user’s desktop (if one desktop crashed it could corrupt the database for everyone). The new architecture was called “client-server.” It let the database stay safely in a datacenter on a server, which was both more secure and way faster for everyone. That was good because at some news organizations in the 90s collections were exceeding 1,000,000 records, and even today some technologies do not scale that far. So, scalability was achievable if you picked the right architecture and technologies.
Speed has always been a focus at MerlinOne: we don’t like to wait for searches and we figure you will not either. We also thought that people would want to be able to browse and use images almost immediately after they hit their Merlin system, and so we achieved record input times of well under one second per image.
Most of our systems at the time were at major news organizations, and as the saying went, “There is a deadline every second for someone around the world,” so our systems could never go down. That meant we needed to invent a fault-tolerant architecture, and we used to invite prospective customers into our datacenter and literally let them pull power plugs from the back of servers to show them how our systems still worked and users were not interrupted. We have been the system of choice for mission-critical applications, and have had a system in use at the White House across six presidential terms and five Presidents since 2001. It has been decades since our customers have seen any data corrupted or lost.
In the same time period “picture desks” stopped being standalone silos: the first third-party integrations happened, initially with text layout and printing systems (the precursors to today’s CMS systems). We studied the problem and wrote the first bi-directional interface for these external publishing systems.
Finally, in the late 90s it became possible (and economically attractive) for our customers to move their systems out of their expensive datacenters and let us host them in a centralized location. This pulled the burden for supporting a system from their IT staff (who had dozens of other systems to worry about) and let us take care of it: after all we were the best people in the world to support our systems! As a result, many of our customers got better support for less cost, a win-win for them, and at the time, another innovation.
The Search for Video
Around 2006 video (albeit short and low resolution) started to appear on web sites everywhere, and we looked at the problem of “how do we help users search videos?” After all, there were no captions for each scene, so if you were looking for a clip from the State of the Union speech where the President mentioned “Afghanistan” you had to watch the whole hour-long speech to find it.
MerlinOne came up with the idea of stripping the audio track from a video, doing a speech-to-text conversion from it, and indexing each word against time code. As a result, you could search for “Afghanistan” and your Merlin could cue you up five seconds before every time the President said the word in the speech. This way it would take you seconds, not an hour, to find just the sentence you were looking for. Another innovation that let you search through a multi-hour college lecture for just the sentence you needed.
Around the same time our product got a name change! In recognition of expanding beyond still images to include videos, Office files, PDF’s, and lots of other digital file formats, our Merlin Picture Desk became the Merlin DAM (Digital Asset Management).
The AI Era
Remember when this story started it was a combination of new software and new hardware processors that enabled DAM in the first place? Well, five years ago we recognized the same thing was happening again: the very beginnings of AI applications for imaging (software), and the evolution of moderate cost Graphics Processing Units (GPU hardware) for the gaming industry suggested to us that there was potential for properly applied AI technologies to let us make a big improvement in DAM for our users, with never-before-imagined capabilities.
From the beginning we understood that eventually the Googles and the Microsofts would make AI technology available to “rent,” but from our long experience in DAM, we knew our users would neither be served by, nor would they be satisfied with off-the-shelf AI applications. So, we studied AI approaches, attended conferences, and read many hundreds of research papers detailing the latest research.
Early on we learned that one of the most important parts of developing a new and excellent AI application was having a first-class training set: a huge collection of high-quality images with professionally written, unbiased descriptive captions. We recognized that by virtue of our news and media customers, we had exactly that kind of diverse, terrific training set: the hundreds of millions of news images gathered over 30 years, showing every significant aspect of life on Earth! That meant that we could “fine-tune” any AI models using this huge dataset and make every model even more effective than anyone else previously. Not only could we innovate freely for the benefit of our users, but we could create models that would outperform anyone else because of our great training set.
We started with a relatively finite problem: could we build technology where you could click on someone’s face in one image, and we would find and label that person in any other image they appeared in (facial recognition). What if a hospital had a fundraiser last night, 1,000 pictures were shot, and the next morning we had to find the best picture of the biggest donor and our star oncologist? How many hours could this capability save our users? No one else has this innovation.
Next, as so often happens, you might find an image of, for example, a picnic scene, and while you know there are better images, try as you might with text searches you cannot find it. What if you could click on the “almost good enough” image and tell your system, “Find me stuff that looks like this!”? Presto! You find the better picnic shot, which previously was undiscoverable by text search! Visual Similarity searching was created.
We heard so many stories about people being tasked with putting the best, most memorable photo of something on the company website, how they narrowed down their choices to a handful of images, but then hit a wall: they could not decide which was the BEST image. Since a few of us had been photo editors in earlier lives, we knew the agony, and we had years of training to help us. What if, we wondered, we got some world-class photo editors to grade a large quantity of images from low to high impact, creating a great training set for an AI engine to sort images from “most impact” to “least impact”? What a help that would be for all the fledgling photo editors out there! So, we created the IMPACT AI engine.
Back in the news industry there was the culture of filling out caption information (the “metadata” for a visual object) fully, but over time that just stopped happening, and especially outside the news industry complete metadata is an increasingly rare thing, and it costly to apply after the fact. But for the past 30 years, the only way to search for any visual object was by exactly matching search text terms with the words someone else used to describe a scene. Take away the metadata and there is nothing to search against, and large quantities of your images just become undiscoverable! How do you innovate and solve that problem?
MerlinOne have had a terrific time developing NOMAD™ the ability to search an image even if it has NO MetAData! We trained one AI engine to understand not only the individual words of your search, but also the CONCEPT you are looking for. We trained another AI engine to recognize all the objects in each of your images. Then we created a third process to understand how to link the two: how to reliably and accurately find just the right images for your search EVEN IF THEY HAVE ZERO METADATA, a purely Visual search!
Being able to innovate like this, to bring never-before-possible and obviously useful capabilities to our Merlin DAM users makes us smile and feel happy to get up each morning. This has, so far, been a 30-year exploration of the limits of technology to bring meaningful advances that help people do their jobs more efficiently with less stress, and we find nothing more rewarding than? serving others in their jobs!
Stay tuned! We have a continuing roadmap of further innovations and inventions, and we are doing our best to delight you with ways to make your jobs easier and more fulfilling! Next stop, NOMAD™ for video!
This piece was written by David Tenenbaum, CEO of MerlinOne, and an advocate for constant DAM innovation. Connect with him on LinkedIn or email the author directly – email@example.com