DAM Software Searchability, Principal Data, Metadata, and Enrichment Technologies
All DAM systems provide a combination of search and taxonomy to allow you to organize data and to find it when you want it. The meat of DAM is two kinds of data: principal data that is the visible, usable component of the asset that you want to use in your publication, and metadata, which is descriptive of the document and allows it to be searched. These days the dividing line between principal data and metadata is becoming blurred.
A few years ago when I was working on a project outside of the conventional DAM industry, I had a surprising conversation with a colleague: we had very different ideas of what the word “metadata” actually meant. For me, as a technical guy dealing mainly with large media objects, it was anything that resided in a database, outside of the principal content of the original asset’s file. That included the full text of a PDF page, reconstructed into readable order; the text extracted by speech recognition of the audio track of a video file; EXIF data embedded by a camera. For her, those things were the all “principal data”, and metadata was what had been added later as annotation: file names, key words, commentary, usage history and key dates were “metadata”, but nothing that originated from inside the file was.
Aside from providing something for experts to have debates about, this subject is getting more interesting, because software is becoming better at inferring searchable information from the raw data of assets. Consider for example:
- Scene identification software can identify portraits of people, architectural photos, landscape photography and numerous other categories, as well as picking out simple things like dominant colors and image orientation.
- When an asset of any kind contains text, it’s often possible to enrich the data by analyzing the text and “normalizing” it so that all assets with the same kind of subject matter have comparable metadata, and can be found together in search results. For example, software can enhance a story about an athlete by adding her nickname, team, league, sport, and other pieces of useful metadata and taxonomy. It can annotate a story that references a company name by adding a stock ticker abbreviation and other corporate information. It might also “normalize” the spelling of a subject’s name or embed common alternate spellings in order to make the object more findable. It might add keywords based on an analysis of the likely subject matter. It might translate a caption into another language. These enhancements don’t just change the ease of searching of items, they also make it possible, having found one asset, to find related ones.
- Rights management metadata is amenable to automatic enhancement. Whether derived from source data like creator and copyright information in an XMP or IPTC package, or extracted from steganographic protection built into the image, a DAM can reach out to external services to find ownership and rights information, use restrictions, and costs.
- Facial recognition software, even when it can’t identify the individuals in a photograph, can provide metadata which is useful in some contexts. For example, it can count the number of faces to determine if it’s a group photo or a crowd shot, it can estimate age, guess at racial characteristics, gender, the presence of beards eyeglasses and smiles, whether mouths are open, and even the emotion being expressed for each face it detects. That kind of information can be useful editorially, especially when augmented by metadata that was input at the creation of the photo. It makes searches like “show me the President with smiling children” a practical possibility.
No single DAM vendor will be able to provide all of the possibilities and maintain them at the state of the art. You, as a user, might be able to use a diversity of features that no single vendor can provide, so the key to implementing any of these capabilities is the flexibility to interface to external service providers. For example, Merlin connects to web services using the forthcoming Merlin-X Workflow Engine.
As artificial intelligence and sophisticated algorithms improve, and as technology industries continue to wrestle with the “Big Data problem” the universe of metadata enhancement services will continue to expand, and the distinction between metadata and principal data will continue to blur. There will always be a need for human-entered metadata, but that metadata will increasingly be enriched by automation.
Chief Technology Officer
If you would like to learn more about MerlinOne Digital Asset Management solutions click here or…….