Digital Archive: Avoiding A Digital Dark Age

The first color image from the surface of Mars. This picture was taken by the lander's number 2 camera on July 21, 1976 the day following Viking I's successful landing on Mars. Photo by NASA Jet Propulsion Laboratory

The first color image from the surface of Mars. This picture was taken by the lander’s number 2 camera on July 21, 1976 the day following Viking I’s successful landing on Mars.

A prominent computer scientist once quipped, “digital media lasts forever or five years – whichever comes sooner.”

This was the late 1990s and a handful of stories about lost or nearly irretrievable data raised alarm in some circles. The twin-issues of preserving data and being able to make sense of it five, ten, or 100 years later was starting to be understood.

Alarm was raised, justifiably, that we were in the midst of a “digital dark age” where much of our record as a civilization was in threat of being lost in an unprecedentedly short time. And there was plenty of anecdotal evidence to stoke these fears.

As technology advanced at the furious pace of the 1980s, 90s and the new millennium, storage devices and file formats became obsolete in a matter of a couple years or, in some cases, months. How many SyQuest disks or 5.25” floppy disks containing WordStar or MultiPlan documents are optimistically stored in closets hoping that, should the need arise, the data could still be retrieved and interpreted in a meaningful way? How is it that the Dead Sea Scrolls, recorded on parchment and papyrus and stored in clay pots in a cave for 2,000 years, are still partially decipherable, but those masterpiece term papers I stored on a floppy disk when I was in college are completely irretrievable?

A couple of high-profile incidents brought this problem home clearly. Perhaps the most often-cited incident came to light in the late 1980s when about 3,000 previously unprocessed images from the Viking mission to Mars from the late 1970s were discovered. The images were stored on magnetic tapes written by unique military-issue drives in cryptic or undocumented formats. Worse yet, the engineers who designed the systems had died by the time the tapes were discovered. Furthermore, the tapes were so fragile that when the proper reading devices were found (after much searching), the act of reading the data caused the oxides to flake off making further reading impossible. With great effort, some of the images were retrieved. Others were lost forever.

Fast forward to 2010. The adoption of digital workflows has long been standard practice with many great benefits, but are we still at risk of living in a digital dark age? The answer is clearly yes, but the strategies and tools for addressing the problem have improved steadily. Awareness of the tools and strategies is something everyone, especially those involved in managing long-term data archives, should be thinking about.

Following the pack is a wise strategy. The adoption of open and well-documented standards has improved the viability of a long-term digital archive in recent years. For photos, JPEG has been around for a long time. Current versions of Photoshop and most other programs can still read JPEGs written 20 years ago. PDF is also ubiquitous and has been formalized as a long-term archive format by the International Standards Organization in the form of PDF/A (ISO 19005-1). Properly-encoded text files and a handful of other formats have also stood up over time. Avoid undocumented proprietary document formats for long term archiving and, if that is unavoidable, consider archiving the documentation and programs required to interpret the data along with the data.

Another consideration is being able to find objects as collections grow over time. At some point, a digital asset management system becomes indispensable. Just as adopting common file formats and storage systems is important for longevity, choosing a digital asset management system based on ubiquitous, standard database platforms and open standards is critical.

A final consideration is to steer clear of complacency. It is probably worthwhile to periodically evaluate whether any content in your digital archive is in danger of obsolescence and consider ways to migrate to current, well understood storage systems and formats.

Links suggestions:
http://en.wikipedia.org/wiki/Digital_Dark_Age
http://en.wikipedia.org/wiki/PDF/A
http://www.americanscientist.org/issues/pub/avoiding-a-digital-dark-age/1
http://www.nytimes.com/1990/03/20/science/lost-on-earth-wealth-of-data-found-in-space.html?sec=&spon=&pagewanted=all

Posted by Chris Carr
Photo by NASA Jet Propulsion Laboratory

Leave a Reply